SRE Engineer
IBM
Job Description
Your role and responsibilities
-
Ensure service reliability, availability, and performance through SLO/SLI-driven engineering.
-
Build and maintain monitoring, alerting, and observability systems.
-
Manage and improve CI/CD pipelines and deployment automation.
-
Automate infrastructure using IaC tools and reduce operational toil.
-
Lead incident response, root-cause analysis, and post-mortems.
-
Optimize system scalability, performance, and capacity planning.
-
Collaborate with development teams to embed reliability into design and operations.
Required education
Bachelor's Degree
Required technical and professional expertise
-
Minimum 2+ years of working experience in applying Site Reliability Engineering (SRE) principles
-
Hands-on experience with CI/CD concepts and tools such as Jenkins, Docker, Kubernetes
-
Strong knowledge of monitoring and observability frameworks including Prometheus and Grafana
-
Experience in Infrastructure as Code (IaC) using Terraform
-
Proficiency in basic Python programming and Shell scripting
-
Understanding of networking fundamentals and observability concepts