Site Reliability Engineer
equifax
Job Description
What You’ll Do
-
You will engage in and improve the software development lifecycle – from inception and design, through development, deployment, operation and refinement
-
You will influence and design infrastructure, architecture, standards and methods for large-scale systems
-
You will support services prior to production via infrastructure design, software platform development, load testing, capacity planning and launch reviews
-
You will maintain services during deployment and in production by measuring and monitoring key performance and service level indicators including availability, latency, and overall system health
-
You will automate system scalability and continually work to improve system resiliency, performance and efficiency
-
You will practice sustainable incident response as part of an on-call rotation and through blameless postmortems
-
You will remediate tasks within corrective action plan via sustainable, preventative, and automated measures whenever possible
What experience you need
-
BS degree or equivalent job experience required
-
5+ years of experience developing and/or administering software in public cloud or equivalent experience
-
Experience in languages such as Python or Ruby or Bash or Java or Go or Perl or JavaScript and/or node.js or equivalent experience
What could set you apart
-
Any valid and active cloud certification
-
System administration skills, including automation and orchestration of Linux/Windows using Chef or Puppet or Ansible or Salt Stack and/or containers (Docker, Kubernetes, etc.) or equivalent experience
-
Demonstrable cross-functional knowledge with systems, storage, networking, security and databases or equivalent experience
-
Experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives.
-
Proficiency with continuous integration and continuous delivery tooling and practices
-
Strong analytical and troubleshooting skills
-
You have expertise designing, analyzing and troubleshooting large-scale distributed systems.
-
You take a system problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
-
You have experience managing Infrastructure as code via tools such as Terraform or Cloud Formation
-
You are passionate for automation with a desire to eliminate toil whenever possible
-
You’ve built software or maintained systems in a highly secure, regulated or compliant industry
-
You thrive in and have experience and passion for working within a DevOps culture and as part of a team
-