Site Reliability Engineer

boomi

India 3 Years Exp Posted 207d ago

Job Description

What You’ll Do

  • Participate actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs/ SLOs are defined and met.

  • Participate in on-call rotation to ensure coverage for planned/unplanned events.

  • Engage with other Engineering organizations to implement processes, identify improvements, and drive consistent results.

  • Working with your SRE and Engineering counterparts for driving DR exercises, Game days, training and other response readiness efforts.

  • Collaborate with Service Engineering organizations to build and automate tooling, implement best practices on Observability and manage the Boomi services in production and consistently achieve our market leading SLA.

  • Improving the scalability and reliability of Boomi’s systems in production.

  • Automate the provisioning and maintenance of Boomi’s infrastructure.

  • Work independently with a minimal level of guidance from technical leadership.

  • Mentor other Boomi engineers, including design collaboration and code reviews.

The Experience You Bring

  • Passionate about SRE, DevOps, Automation and infrastructure platforms. Expert in developing Ansible playbooks and automation for Infrastructure as code using Terraform and Cloud Formation Templates.

  • Expert in defining, measuring, and improving Reliability Metrics (SLO/SLI/ Error budgets).

  • Strong in implementing observability practices (Monitoring, Logging, Distributed Tracing etc.) preferably using Splunk and New Relic. Experience should not be limited to using the dashboards, but creating them from scratch.

  • Experience in conducting and automating DR exercise in AWS cloud thus validating RPOs and RTOs.

  • Strong understanding and working experience with AWS components.

  • Ability to design and implement API’s for use by internal teams.

Bonus Points If You Have

  • 3–5 years of related experience in the software engineering industry, with experience supporting large scale software systems in production.

  • Certified in Cloud (AWS/Azure/GCP), experience in using services such as computers,  containers and databases.

  • Experience in Ansible/Terraform and Python.

  • A grasp of Cloud Native concepts, containerization best practices and security awareness in Cloud will be a strong plus.

  • Experience in Observability, creating dashboards for SLA/SLI/SLO.

Similar Openings for You