Site Reliability Engineer
equifax
Job Description
What you’ll do
-
Manage system(s) uptime across cloud-native (AWS, GCP) and hybrid architectures.
-
Build infrastructure as code (IAC) patterns that meet security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK).
-
Build CI/CD pipelines for build, test and deployment of application and cloud architecture patterns, using platform (Jenkins) and cloud-native toolchains.
-
Build automated tooling to deploy service requests to push a change into production. Build runbooks that are comprehensive and detailed to manage detect, remediate and restore services.
-
Solve problems and triage complex distributed architecture service maps. On call for high severity application incidents and improving run books to improve MTTR
-
Lead availability blameless postmortem and own the call to action to remediate recurrences.
What experience you need
-
BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required
-
5-7 years of experience in software engineering, systems administration, database administration, and networking.
-
2+ years of experience developing and/or administering software in public cloud
-
Experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives.
-
Experience in languages such as Python, Bash, Java, Go JavaScript and/or node.js
-
Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
-
System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible and/or containers (Docker, Kubernetes, etc.)
-
Proficiency with continuous integration and continuous delivery tooling and practices
-
Cloud Certification Strongly Preferred