Site Reliability Engineer

okta

Bangalore 3 Years Exp Posted 346d ago

Collaborate with engineering teams to improve availability, reliability, and observability of their services.
Participate in regular on-call rotations to ensure 24/7 coverage of all critical systems
Use existing monitoring tools to identify problems and resolve and/or escalate to service teams
Implement changes to enable or improve infrastructure resilience, monitoring, and alerting
Develop and do continuous refinement of SRE tools and processes to improve software delivery, observability, reliability, and operational efficiency.
Daily coding, scripting, and development - Go, Terraform, Helm, etc
Optimize existing systems and eliminate toil through simplification and automation.
Define, document, and advocate reliability best practices and policies

You might be a good fit if you:

Have 3+ years industry experience as a Site Reliability Engineer
Have experience in Golang
Have experience in managing infrastructure with Terraform at scale
Are comfortable working with a fully distributed team
Have experience as software developer in a SaaS environment
Have experience in a production environment supporting large-scale, mission-critical applications
Have demonstrable expertise working with Microsoft Azure and/or Amazon Web Services.
Production on-call experience in a 24/7 cloud based environment
Have a good understanding of microservices, cloud infrastructure (AWS, Azure, GCP), databases (SQL, No-SQL, Key/Value), containers (docker, kubernetes), web technologies (web sockets, http) and networking (SSL, routing, VPN)
Exceptional communication skills, including technical writing in the English language
Have a systematic problem-solving approach, coupled with a strong sense of ownership and drive
Comfortable with the Agile software development methodology
- Loves to work as a team, but is able to work effectively in a remote environment where tasks may be self-driven