Site Reliability Engineer

worldpay

Bengaluru NM Years Exp Posted 229d ago

What you'll own

Own and evolve CI/CD pipelines to support rapid, reliable, and secure deployments.
Define and implement release readiness standards, including automated checks, rollback strategies, and deployment validation.
Collaborate with product engineering to ensure new features meet operational and release criteria.
Build tooling and automation to streamline build, test, and release workflows.
Improve observability and traceability of releases across environments.
Partner with other SREs, platform and operations teams to ensure infrastructure supports scalable and resilient deployments.
Lead post-release reviews and participate in incident retrospectives to drive continuous improvement.
Participate in on-call rotations to support critical releases and troubleshoot deployment issues.
Guiding product engineering teams on how to achieve operational excellence for new product and feature launches.
Proactively find and analyze reliability problems across our stack, design and implement software to create solutions that are secure, scalable and highly available.

What you bring

An operational mindset and a drive to achieve operational excellence with at least 7+ years of in software/system engineering with a focus on build and release.
Proven experience designing and maintaining CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI).
Strong understanding of release management, deployment strategies, and rollback mechanisms.
Experience with AWS and infrastructure-as-code tools (e.g., Terraform, Docker, Kubernetes).
Proficiency in PHP/C# and scripting languages for automation.
Familiarity with observability tools (e.g., Datadog, OpenTelemetry) to monitor release health.
Experience implementing SLI/SLO/SLA frameworks to measure release success.
Excellent communication and collaboration skills across cross-functional teams.
You have strong skills around observability, debugging and performance tuning
Ability to perform capacity planning and ensure an architecture is scalable to support fluctuating volumes
Understanding of implementing solutions to reduce service disruptions and improving MTTD/MTTR
Bachelor's or Master's degree in Computer Science or equivalent experience.
Ability to work collaboratively across many teams influencing decisions and setting standards
This role requires on-call availability to ensure swift resolution of issues outside regular business hours

Bonus if you have

Experience designing and building reliable systems capable of handling high throughput and low latency
Experience with feature flagging, canary deployments, and blue-green release strategies.
Familiarity with PHP, TypeScript, Node.js.
Exposure to multi-cloud environments (AWS, GCP, Azure).
Experience in fast-paced, high-growth environments.
Positive outlook, strong work ethic, and responsive to internal and external clients.