Site Reliability Engineering Manager

hpe

Bangalore 7 Years Exp Posted 388d ago

Job Description

What you’ll do:

  • Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being.
  • Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning.
  • Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services.
  • Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure.
  • Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development.
  • Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning.
  • Define and track key reliability metrics, and report on team performance and system health to leadership.
  • Contribute to hiring, onboarding, and career development for SREs.

What you need to bring:

  • 7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.
  • Minimum 2 years of experience managing or leading cloud operations teams.
  • Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures.
  • Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools.
  • Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response.
  • Familiarity with modern CI/CD automation and tools.
  • Excellent communication, stakeholder management, and team-building skills.
  • Experience scaling SRE practices in high-growth or large-scale environments.
  • Ability to balance long-term reliability initiatives with short-term delivery needs.

Similar Openings for You