Senior Observability Engineer
arm
Job Description
Responsibilities:
- Contribute to the design and develop comprehensive observability and monitoring strategies for infrastructure and sophisticated engineering systems and applications.
- Build and manage monitoring tools and platforms such as Prometheus, Grafana, Azure Monitoring, AWS CloudWatch, Dynatrace/Datadog and similar tools that forms our AIOps stack.
- Develop and maintain dashboards, alerts, and reports to provide real-time insights into system performance and health.
- Collaborate with multi-functional teams to identify and resolve performance bottlenecks and reliability issues.
- Automate monitoring and alerting processes to improve efficiency and reduce manual intervention.
- Conduct root cause analysis of incidents and implement preventive measures to avoid recurrence.
- Mentor and guide junior engineers in standard methodologies for observability and monitoring.
- Stay up-to-date with the latest industry trends and technologies to continuously improve our monitoring capabilities.
Required Skills and Experience :
- Bachelor’s degree in Computer Science, Engineering, or a related field with demonstrated ability in observability and monitoring roles.
- Proficiency in monitoring tools and platforms such as Prometheus, Grafana, AWS CloudWatch, Azure Monitor, Datadog, Dynatrace, etc.
- Strong understanding of cloud environments (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker).
- Experience with scripting and automation using languages such as Python, Bash, or similar.
- Excellent problem-solving skills and attention to detail.
- Strong communication and teamwork skills.
- Ability to work in a fast-paced, multifaceted environment.