SRE automation and platform support
FIS
Job Description
What you will be doing
-
Understanding project KPIs, SLI's, SLO's, MTTD, MTTR, Error budgets, Chaos engineering and eliminating TOILs by automation
-
Exploring observability tools and creating/implementing dashboards
-
Run the production environment by monitoring availability and taking a holistic view of system health
-
Incident Management: Knowledge in handling incidents, participating in blameless postmortem, performing root cause analysis, and implementing post-incident reviews.
-
Develop scripts to reduce toil and automate repetitive tasks, issues resolution scripting.
-
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
-
Implementing various development, testing, automation tools
-
Setting up tools and required infrastructure
-
Monitoring and measuring customer experience and KPIs
-
Strive for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
What you bring:
-
5 to 9yrs of experience in supporting Unix/Linux/Windows based application environments
-
Knowledge of any RDBMS/NoSql
-
Good knowledge of application support domain
-
Worked on/with System and Application Monitoring and Observability tools – Splunk, Prometheus, Grafana, Dynatrace.
-
Hands on experience in preparing PowerShell/Python/Shell script automation.
-
Exposure to latest SRE, Cloud, DevOps technologies. Also, Knowledge of Containers, Dockers, Kubernetes/OpenShift tools.
-
Skills in using tools like Terraform & Ansible to automate infrastructure management.