Principal DevOps Engineer
providence
Job Description
As a SRE Principal Engineer, you will
- Play a Lead role in SRE Engineering team and work with Product engineering team, DevOps & Operations Teams and play pivotal role in uptime for applications & services.
- Act as the 1st line of defense for Production Health of HI applications & Product services.
- Collaborate extensively with Product teams, discuss SRE practices, SLO/SLI & Project Roadmaps
- Design, Build, maintain tools and frameworks that support deployment automation, health-check of applications.
- Play vital role in driving regular credence with Product teams on SLI/SLO dashboards, Operational Metrics, Reliability & Availability.
- Own end-to-end availability and performance of key services and build automation to prevent problem reoccurrence. Automate response to all non-exceptional service conditions.
- Lead by example, mentor the team and establish credibility through quality technical execution.
- Manage on-call rotations across geo-locations, using a follow-the-sun model.
- Sound troubleshooting issues skills & participating in Severity issues & CODE RED calls.
- Create knowledge repository of Severity issues & best practices.
- Take initiatives to setup knowledge sharing sessions on awareness of different Products & respective functionalities.
What would your day look like?
- Monitor SRE Dashboards, highlight deviations in SLO/SLI, work with Incident Management Command center & Product teams to fix any potential issue.
- Play SME & a Quarterback role in Severity 1 & 2 issues and CODE RED situations. Guide product teams for faster resolution.
- Partner with Engineering team on measuring & improving Availability & reliability of Applications & Product services, Further monitor SRE dashboards & alerts and find opportunities to create more meaningful monitors for increasing reliability & availability for users.
- Perform RCA and track action items towards Closure.
- Create automation framework & pipelines for deploying monitors & build SRE dashboards which can be consume by various product teams.
- Track & deliver the assigned sprint items in a timely manner, with high quality.