Site Reliability Engineering Manager

hpe

Bangalore 7 Years Exp Posted 388d ago

What you’ll do:

Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being.
Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning.
Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services.
Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure.
Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development.
Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning.
Define and track key reliability metrics, and report on team performance and system health to leadership.
Contribute to hiring, onboarding, and career development for SREs.

What you need to bring:

7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.
Minimum 2 years of experience managing or leading cloud operations teams.
Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures.
Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools.
Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response.
Familiarity with modern CI/CD automation and tools.
Excellent communication, stakeholder management, and team-building skills.
Experience scaling SRE practices in high-growth or large-scale environments.
Ability to balance long-term reliability initiatives with short-term delivery needs.