Sr. Engineer - Hashicorp Cloud DR (Hybrid)
hashicorp
Job Description
What you’ll do (responsibilities)
- Implement best practices for system reliability and disaster recovery, including proactive identification of potential failure points and the development of automated mitigations.
- Design and execute comprehensive DR testing strategies to identify bottlenecks and failure points that affect RPO and RTO across our cloud products.
- Drive initiatives around DR compliance and implement best practices and technologies to improve system resilience, ensuring high availability and fault tolerance through the Chaos testing framework.
- Conducting rigorous performance benchmarking and testing to validate the efficiency and scalability of the tooling we want to build for the orchestration of DR across our cloud products.
- Work closely with engineering and product teams to integrate operational readiness into the development lifecycle, enhancing product stability and user satisfaction.
- Build and refine tools and frameworks for automated testing, environment simulation, and incident reproduction, reducing manual effort and increasing test coverage.
- Conduct mock drills and drive chaos tests in collaboration with partner teams, analyzing test results, documenting findings and making actionable recommendations for systemic improvements
- Share your knowledge and expertise with team members, fostering a culture of learning and continuous improvement.
What you’ll need (basic qualifications)
- 6+ years of experience in software development, reliability engineering, systems engineering, or non functional testing roles with a focus on Disaster recovery or backup and recovery of Cloud based systems.
- Having commitment to explore career opportunity in Reliability Engineering field
- Proficient in Golang programming language or any other scripting language
- Hands on experience with version control systems such as Git , Gitlab
- Understands micro services architecture
- Good understanding of CI/CD process and maintaining quality pipelines
- Experience in collecting various metrics and building data pipeline to analyze data and building dashboards for availability and status of various components across the cloud
- Exposure to cloud technologies ( AWS, Azure, Or GCP) and container technologies like Nomad or Kubernetes.
- Effective communication and collaboration skills, capable of working with cross-functional teams and articulating technical concepts to diverse audiences.