Site Reliability Engineer
equifax
Job Description
What you’ll do
-
Manage system(s) uptime across cloud-native (AWS, GCP) and hybrid architectures.
-
Build infrastructure as code (IAC) patterns that meet security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK).
-
Build CI/CD pipelines for build, test and deployment of application and cloud architecture patterns, using platform (Jenkins) and cloud-native toolchains.
-
Build automated tooling to deploy service request to push a change into production
-
Solve problems and triage complex distributed architecture service map.
-
Build runbooks that are comprehensive and detailed to manage detect, remediate and restore services.
-
Lead availability blameless postmortem and own the call to action to remediate recurrences.
-
On call for high severity application incidents and improving run books to improve MTTR
-
Participate in a team of first responders in a 24/7, follow the sun operating model for incident and problem management.
-
Effectively communicate to technical peers and team members in both written and verbal formats.
What experience you need
-
BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required
-
2+ years of experience developing and/or administering software in public cloud
-
5+ years experience in languages such as Python, Bash, Java, Go JavaScript and/or node.js
-
5+ years experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives.
-
5+ years experience of cross-functional knowledge with systems, storage, networking, security and databases
-
5+ years experience of system administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible and/or containers (Docker, Kubernetes, etc.)
-
5+ years experience working with continuous integration and continuous delivery tooling and practices
What could set you apart
-
You have expertise designing, analyzing and troubleshooting large-scale distributed systems.
-
You take a system problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
-
You have experience managing Infrastructure as code via tools such as Terraform or CloudFormation
-
You are passionate for automation with a desire to eliminate toil whenever possible
-
You’ve built software or maintained systems in a highly secure, regulated or compliant industry
-
You thrive in and have experience and passion for working within a DevOps culture and as part of a team