Observability Engineer

ripplehire

Bangalore 5 Years Exp Posted 82d ago

Job Description

Key Responsibilities

· Observability Strategy: Define and execute a full-stack observability roadmap aligned with business and IT goals, embedding AIOps and SRE principles.

· Monitoring Frameworks: Design and implement comprehensive monitoring solutions for applications, infrastructure, and networks to ensure continuous performance and availability.

· Data Analysis & Insights: Use AIOps-driven analytics to identify trends, predict failures, and automate corrective actions.

· Tool Ownership & Integration: Manage and optimize observability tools (Splunk, Datadog, Prometheus, Grafana, ThousandEyes, ServiceNow AIOps, etc.), integrating them across hybrid environments.

· Automation & Intelligence: Develop automated workflows for ing, incident detection, and root cause analysis using scripting and AI-driven approaches.

· Dashboarding & Reporting: Build intelligent dashboards and provide actionable insights to stakeholders on system health, incidents, and performance improvements.

· Incident & Problem Management: Partner with ITSM teams to enhance detection, triage, and resolution workflows with AI-assisted root cause analysis.

· Continuous Improvement: Stay updated with emerging observability and AIOps technologies, integrating them to enhance monitoring capabilities.

Qualifications

· 7 to 10 years in IT infrastructure, monitoring, and observability roles.

· Strong experience in AIOps platforms and applying AI/ML for monitoring, anomaly detection, and predictive analytics.

· Expertise with observability tools: Datadog, OpManager, Splunk, Dynatrace, AppDynamics, New Relic, Prometheus, Grafana, Nagios, etc.

· Familiarity with cloud-native monitoring across AWS, Azure, GCP, and on-premise data centers.

· Proficiency in scripting/automation (Python, Shell, PowerShell, Ansible).

· Experience with DevOps and cloud-native environments (Kubernetes, Docker, Terraform, CI/CD pipelines).

· Knowledge of database monitoring (SQL and NoSQL).

· Strong analytical and problem-solving skills for proactive detection and resolution.

· Excellent communication and collaboration skills to work across IT Ops, DevOps, Security, and Application teams.

· Experience presenting monitoring insights and observability metrics to executives and stakeholders.

· Solid foundation in networking and Linux administration.

· Experience with Atlassian tooling (Jira, Confluence) preferred.

· Certifications (ITIL, DevOps, AWS, Azure, GCP, Agile, PMP) are a plus

Observability Engineer

Job Description

Similar Openings for You

Senior Data Engineering Analyst

Data Platform Engineering Lead - AITDS

Data Engineer Senior Consultant

Sr.Data Engineer