Site Reliability Engineer - VP

citi

Pune 10 Years Exp Posted 412d ago

Job Description

Key Responsibilities:

  • Deliver against the observability roadmap for Services Technology by building scalable, reusable telemetry solutions.

  • Create and maintain dashboards and visualizations for critical client journeys, including real-time flows across Payments.

  • Guide line-of-business teams in implementing SLIs/SLOs, golden signals, and effective alerting to support operational excellence.

  • Support integration and adoption of observability tooling across on-prem, public cloud (AWS/GCP), and containerized environments (ECS, Kubernetes).

  • Customize shared dashboards and observability components in partnership with CTI and other central Engineering functions, ensuring usability and flexibility.

  • Provide technical support and implementation guidance to SREs and developers facing integration or tooling challenges.

  • Effectively manage the observability book of work for Services Technology and drive initiatives to reduce MTTD and improve recovery outcomes.

  • Serve as a key connection point between line-of-business SREs and central infrastructure functions by gathering tooling feedback, surfacing systemic issues, and influencing platform enhancements via the Services Observability Forum.

  • Stay current with observability trends, including AI/ML-driven insights, anomaly detection, and emerging OSS practices, and assess their applicability.

  • Maintain strong knowledge of observability platform features and vendor offerings to advise teams and maximize the value of tooling investments.

 

Qualifications:

  • 10+ years of experience in SRE, Observability Engineering, or platform infrastructure roles focused on operational telemetry.

  • Hands-on experience in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms.

  • Deep understanding of SLIs, SLOs, Error Budgets, and telemetry best practices in high-availability environments.

  • Proven ability to troubleshoot integration issues and support observability across hybrid platforms (on-prem, cloud, containers).

  • Experience building dashboards aligned to business outcomes and incident workflows, especially in critical flows like payments.

  • Familiarity with modern observability tooling ecosystems, including AI/ML capabilities, trace correlation, baselining, and alert tuning.

  • Strong interpersonal and collaboration skills — able to operate across federated engineering teams and central infrastructure groups.

  • Experience in enablement or platform teams with a track record of scaling best practices across diverse business units.

Education:

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.

Similar Openings for You