Site Reliability Engineer Specialist

globalpayments

pune 4 Years Exp Posted 1h ago

Job Description

  • Participate in architecture and R&D discussions for new technology or processes to increase the performance and reliability of our systems.

  • Chaos engineering - you’re expected to think laterally about how our systems might fail in theory, design tests to demonstrate how they behave in practice, and then formulate and implement remediation plans, as appropriate.

  • Pushing our systems to their limits, and then coming up with designs for how to get them to the next performance tier.

  • Use practices from DevOps and GitOps to improve automation and processes to make self service possible.

  • Safeguarding reliability. Ensuring that our services are highly available, resilient against disasters, self-monitoring, and self-healing.

  • Running “game days” to test assumptions about reliability and learn what will break before it matters to customers.

  • Reviewing designs with an eye toward increasing the holistic stability of our platform and identifying potential risks.

  • Building systems to proactively monitor the health, performance and security of our production and non-production virtualized infrastructure.

  • Improving our monitoring and alerting systems to make sure engineers get paged when it matters (and don’t get paged when it doesn’t).

  • Troubleshooting systems and network issues, alongside our Technical Operations Team.

  • Mentoring other engineers in reliability-related skills.

  • Evolving our SDLC, practices, and tooling to account for Site Reliability considerations and best practices.

    • Developing runbooks and improving documentation.
       

Similar Openings for You