Site Reliability Engineer
equifax
Job Description
What you’ll do
-
Work in a DevSecOps environment responsible for the building and running of large-scale, massively distributed, fault-tolerant systems.
-
Work closely with development and operations teams to build highly available, cost effective systems with extremely high uptime metrics.
-
Work with cloud operations team to resolve trouble tickets, develop and run scripts, and troubleshoot
-
Create new tools and scripts designed for auto-remediation of incidents and establishing end-to-end monitoring and alerting on all critical aspects
-
Build infrastructure as code (IAC) patterns that meets security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK).
-
Participate in a team of first responders in a 24/7, follow the sun operating model for incident and problem management.
What experience you need
-
BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required.
-
1+ years of experience developing and/or administering software in public cloud
-
2+ years experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives.
-
2+ years experience in languages such as Python, Bash, Java, Go JavaScript and/or node.js
-
2+ years experience with cross-functional knowledge with systems, storage, networking, security and databases
-
2+ years experience with system administration, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansibleand/or containers (Docker, Kubernetes, etc.)
-
2+ years experience with continuous integration and continuous delivery tooling and practices