Site Reliability Engineer

worldpay

Bengaluru NM Years Exp Posted 178d ago

Job Description

What you'll own

  • Own and evolve CI/CD pipelines to support rapid, reliable, and secure deployments.

  • Define and implement release readiness standards, including automated checks, rollback strategies, and deployment validation.

  • Collaborate with product engineering to ensure new features meet operational and release criteria.

  • Build tooling and automation to streamline build, test, and release workflows.

  • Improve observability and traceability of releases across environments.

  • Partner with other SREs, platform and operations teams to ensure infrastructure supports scalable and resilient deployments.

  • Lead post-release reviews and participate in incident retrospectives to drive continuous improvement.

  • Participate in on-call rotations to support critical releases and troubleshoot deployment issues.

  • Guiding product engineering teams on how to achieve operational excellence for new product and feature launches.

  • Proactively find and analyze reliability problems across our stack, design and implement software to create solutions that are secure, scalable and highly available.

 

What you bring

  • An operational mindset and a drive to achieve operational excellence with at least 7+ years of in software/system engineering with a focus on build and release.

  • Proven experience designing and maintaining CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI).

  • Strong understanding of release management, deployment strategies, and rollback mechanisms.

  • Experience with AWS and infrastructure-as-code tools (e.g., Terraform, Docker, Kubernetes).

  • Proficiency in PHP/C# and scripting languages for automation.

  • Familiarity with observability tools (e.g., Datadog, OpenTelemetry) to monitor release health.

  • Experience implementing SLI/SLO/SLA frameworks to measure release success.

  • Excellent communication and collaboration skills across cross-functional teams.

  • You have strong skills around observability, debugging and performance tuning

  • Ability to perform capacity planning and ensure an architecture is scalable to support fluctuating volumes

  • Understanding of implementing solutions to reduce service disruptions and improving MTTD/MTTR

  • Bachelor's or Master's degree in Computer Science or equivalent experience.

  • Ability to work collaboratively across many teams influencing decisions and setting standards

  • This role requires on-call availability to ensure swift resolution of issues outside regular business hours

 

Bonus if you have

  • Experience designing and building reliable systems capable of handling high throughput and low latency

  • Experience with feature flagging, canary deployments, and blue-green release strategies.

  • Familiarity with PHP, TypeScript, Node.js.

  • Exposure to multi-cloud environments (AWS, GCP, Azure).

  • Experience in fast-paced, high-growth environments.

  • Positive outlook, strong work ethic, and responsive to internal and external clients.

Similar Openings for You