Site Reliability Engineer, AVP
Natwest
Job Description
You’ll also be:
- Conducting capacity planning exercises to make sure cloud resources can handle anticipated traffic spikes and growth
- Implementing and maintaining monitoring, logging, and alerting systems to provide insights into cloud infrastructure and applications' health and performance
- Delivering automation solutions to minimise and eliminate manual tasks associated with maintaining and supporting the applications
- Ensuring an in-depth understanding of the full tech stack on which the application resides and depends on
- Identifying alerting and monitoring requirements for an application, based on sound understanding of customer journeys
- Evaluating the resilience of the end-to-end tech stack on which the applications depend, and addressing weaknesses
- Seeking to reduce frequency of hand-offs in the end-to-end resolution of customer-impacting incidents