Site Reliability Engineer

equifax

Trivandrum 5 Years Exp Posted 473d ago

Job Description

What you’ll do

  • Manage system(s) uptime across cloud-native (AWS, GCP) and hybrid architectures.

  • Build infrastructure as code (IAC) patterns that meet security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK).

  • Build CI/CD pipelines for build, test and deployment of application and cloud architecture patterns, using platform (Jenkins) and cloud-native toolchains.

  • Build automated tooling to deploy service request to push a change into production

  • Solve problems and triage complex distributed architecture service map.

  • Build runbooks that are comprehensive and detailed to manage detect, remediate and restore services.

  • Lead availability blameless postmortem and own the call to action to remediate recurrences.

  • On call for high severity application incidents and improving run books to improve MTTR

  • Participate in a team of first responders in a 24/7, follow the sun operating model for incident and problem management.

  • Effectively communicate to technical peers and team members in both written and verbal formats.


What experience you need

  • 5+ years of experience developing and/or administering software in public cloud

  • 5 + years experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives.

  • 5 + years experience of cross-functional knowledge with systems, storage, networking, security and databases

  • 5 + years experience of system administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible and/or containers (Docker, Kubernetes, etc.)

  • 5 + years experience working with continuous integration and continuous delivery tooling and practices

  • Good knowledge working on GCP

Similar Openings for You