Site Reliability Engineer
worldpay
Job Description
What you'll own
-
Own and evolve CI/CD pipelines to support rapid, reliable, and secure deployments.
-
Define and implement release readiness standards, including automated checks, rollback strategies, and deployment validation.
-
Collaborate with product engineering to ensure new features meet operational and release criteria.
-
Build tooling and automation to streamline build, test, and release workflows.
-
Improve observability and traceability of releases across environments.
-
Partner with other SREs, platform and operations teams to ensure infrastructure supports scalable and resilient deployments.
-
Lead post-release reviews and participate in incident retrospectives to drive continuous improvement.
-
Participate in on-call rotations to support critical releases and troubleshoot deployment issues.
-
Guiding product engineering teams on how to achieve operational excellence for new product and feature launches.
-
Proactively find and analyze reliability problems across our stack, design and implement software to create solutions that are secure, scalable and highly available.
What you bring
-
An operational mindset and a drive to achieve operational excellence with at least 7+ years of in software/system engineering with a focus on build and release.
-
Proven experience designing and maintaining CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI).
-
Strong understanding of release management, deployment strategies, and rollback mechanisms.
-
Experience with AWS and infrastructure-as-code tools (e.g., Terraform, Docker, Kubernetes).
-
Proficiency in PHP/C# and scripting languages for automation.
-
Familiarity with observability tools (e.g., Datadog, OpenTelemetry) to monitor release health.
-
Experience implementing SLI/SLO/SLA frameworks to measure release success.
-
Excellent communication and collaboration skills across cross-functional teams.
-
You have strong skills around observability, debugging and performance tuning
-
Ability to perform capacity planning and ensure an architecture is scalable to support fluctuating volumes
-
Understanding of implementing solutions to reduce service disruptions and improving MTTD/MTTR
-
Bachelor's or Master's degree in Computer Science or equivalent experience.
-
Ability to work collaboratively across many teams influencing decisions and setting standards
-
This role requires on-call availability to ensure swift resolution of issues outside regular business hours
Bonus if you have
-
Experience designing and building reliable systems capable of handling high throughput and low latency
-
Experience with feature flagging, canary deployments, and blue-green release strategies.
-
Familiarity with PHP, TypeScript, Node.js.
-
Exposure to multi-cloud environments (AWS, GCP, Azure).
-
Experience in fast-paced, high-growth environments.
-
Positive outlook, strong work ethic, and responsive to internal and external clients.