Site Reliability Engineer Specialist

globalpayments

pune 4 Years Exp Posted 1h ago

Job Description

Participate in architecture and R&D discussions for new technology or processes to increase the performance and reliability of our systems.
Chaos engineering - you’re expected to think laterally about how our systems might fail in theory, design tests to demonstrate how they behave in practice, and then formulate and implement remediation plans, as appropriate.
Pushing our systems to their limits, and then coming up with designs for how to get them to the next performance tier.
Use practices from DevOps and GitOps to improve automation and processes to make self service possible.
Safeguarding reliability. Ensuring that our services are highly available, resilient against disasters, self-monitoring, and self-healing.
Running “game days” to test assumptions about reliability and learn what will break before it matters to customers.
Reviewing designs with an eye toward increasing the holistic stability of our platform and identifying potential risks.
Building systems to proactively monitor the health, performance and security of our production and non-production virtualized infrastructure.
Improving our monitoring and alerting systems to make sure engineers get paged when it matters (and don’t get paged when it doesn’t).
Troubleshooting systems and network issues, alongside our Technical Operations Team.
Mentoring other engineers in reliability-related skills.
Evolving our SDLC, practices, and tooling to account for Site Reliability considerations and best practices.
- Developing runbooks and improving documentation.

Site Reliability Engineer Specialist

Job Description

Participate in architecture and R&D discussions for new technology or processes to increase the performance and reliability of our systems.

Chaos engineering - you’re expected to think laterally about how our systems might fail in theory, design tests to demonstrate how they behave in practice, and then formulate and implement remediation plans, as appropriate.

Pushing our systems to their limits, and then coming up with designs for how to get them to the next performance tier.

Use practices from DevOps and GitOps to improve automation and processes to make self service possible.

Safeguarding reliability. Ensuring that our services are highly available, resilient against disasters, self-monitoring, and self-healing.

Running “game days” to test assumptions about reliability and learn what will break before it matters to customers.

Reviewing designs with an eye toward increasing the holistic stability of our platform and identifying potential risks.

Building systems to proactively monitor the health, performance and security of our production and non-production virtualized infrastructure.

Improving our monitoring and alerting systems to make sure engineers get paged when it matters (and don’t get paged when it doesn’t).

Troubleshooting systems and network issues, alongside our Technical Operations Team.

Mentoring other engineers in reliability-related skills.

Evolving our SDLC, practices, and tooling to account for Site Reliability considerations and best practices.

Developing runbooks and improving documentation.

Similar Openings for You

TECHNICAL LEAD

SENIOR SOFTWARE ENGINEER

Delivery Module Lead

QA Automation Developer