SRE (Site Reliability Engineer)
Cognizant
Job Description
JD :
|
Build and run cloud technology at scale and have experience with either AWS, GCP or Azure |
|
Responsible for the ARQL (Availability, Reliability, Quality and Life cycle) of Services |
|
Expertise in CI/CD pipelines, Full Software Stack, Application Run time, Operating Systems, Middle ware, Data and Networks |
|
Experience in working with the teams to guide and develop scalable and supportable solutions |
|
Comfortable in coding in one or more languages and ability to automate as much as possible |
|
Define SLO, SLI and Error Budgets for the services; Define frameworks for the teams to measure and improve the agreed SLO’s |
|
Demand forecasting and capacity planning; Responsible for Release and Change Management |
|
Anchor meetings with production, application, vendor, executive and key internal stake holders for communicating program status, risk, technology challenges & solutions, practicality of solution and timelines |
|
Participate in Major Incident Management and ensuring the blameless post-mortems are conducted |
|
Develop monitoring, feedback mechanism and industry best practices and technology trends |