Engineering Manager - SRE
hashicorp
Job Description
What you'll do (responsibilities)
- Lead and manage incident response and disaster recovery efforts across high availability SaaS environments.
- Design and execute robust disaster recovery strategies to ensure alignment with Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).
- Drive compliance with organizational and industry standards by embedding best practices for disaster recovery, resilience, and fault tolerance, leveraging Chaos Engineering where appropriate.
- Define and evolve the incident response framework to enable rapid, coordinated resolution of operational disruptions.
- Proactively identify and mitigate potential points of failure through automation and predictive tooling to enhance system stability.
- Analyze incident patterns and root causes to drive continuous improvement in reliability engineering practices and response processes.
- Develop, maintain, and scale engineering tools for real time incident detection, diagnostics, and automated remediation.
- Collaborate with cross functional teams to build frameworks for incident simulation, root cause analysis, and reproducibility at scale.
- Own and Lead DR drills and chaos testing exercises, documenting findings and delivering actionable recommendations for resilience enhancement
- Partner closely with development, operations, and security teams to ensure cohesive incident management and comprehensive post-incident reviews
What you’ll need (basic qualifications)
- Minimum of 12 years of professional experience, including at least 2 years in a managerial capacity within a Site Reliability Engineering (SRE) focused team.
- Demonstrate hands-on leadership in SRE for high-availability SaaS environments with a strong focus on reliability and operational excellence.
- Possess a strong background in cloud-based software development and have led teams addressing scalability, performance, and reliability challenges.
- Demonstrate excellent leadership and project management skills, with a track record of mentoring engineers and driving cross-functional collaboration.
- Show a proactive approach to problem-solving, capable of anticipating and mitigating potential issues before they impact customers.
- Are experienced in agile methodologies, leading teams with empathy, and committed to delivering high-quality, reliable software solutions. #LI-Hybrid