Site Reliability Engineer
equifax
Job Description
What you’ll do
-
Work in a DevSecOps environment responsible for the building and running of large-scale, massively distributed, fault-tolerant systems.
-
Work closely with development and operations teams to build highly available, cost effective systems with extremely high uptime metrics.
-
Work with cloud operations team to resolve trouble tickets, develop and run scripts, and troubleshoot
-
Create new tools and scripts designed for auto-remediation of incidents and establishing end-to-end monitoring and alerting on all critical aspects
-
Build infrastructure as code (IAC) patterns that meets security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK).
-
Participate in a team of first responders in a 24/7, follow the sun operating model for incident and problem management.
What experience you need
-
BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required
-
2-5 years of experience in software engineering, systems administration, database administration, and networking.
-
1+ years of experience developing and/or administering software in public cloud
-
Experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives.
-
Experience in languages such as Python, Bash, Java, Go JavaScript and/or node.js
-
Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
-
System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible and/or containers (Docker, Kubernetes, etc.)
-
Proficiency with continuous integration and continuous delivery tooling and practices
-
Cloud Certification Strongly Preferred
-