Site Reliability Engineer

cgi

Bangalore 5 Years Exp Posted 7d ago

Job Description

OpenShift Engineer is responsible for the day to day operation, maintenance, and support of Red Hat OpenShift Container Platform (OCP) environments. The role ensures that clusters are secure, highly available, and performant, while providing platform services to development teams for the rapid delivery of containerized applications
Contributing Responsibilities
This position combines hands on system administration with automation, monitoring, and collaboration across cross functional teams.
Own the reliability and observability of our Containers Kubernetes and OpenShift platforms — proactively identifying risks before they become incidents.
Lead the response to production degradations, conducting thorough post-mortems and driving systemic fixes to eliminate repeat failures.
Design and implement Ansible playbooks for automated deployment, configuration management, rolling upgrades, and day-to-day operations across Kubernetes and OpenShift clusters.
Build Python tooling to automate health checks, operational workflows, alerting integrations, and pipeline diagnostics — reducing toil and improving team efficiency.
Work closely with DevOps, SRE, Application, and Security teams to resolve platform issues and Conduct onboarding sessions, technical workshops, and produce runbooks for developers.
Enforce change management policies for upgrades, configuration changes, and access control.
Drive capacity planning, cluster scaling, and architecture improvements in collaboration with senior and platform engineers.
Contribute to and continuously improve runbooks, and internal knowledge — raising the bar for how the team operates.
Participate in an on-call rotation with a strong culture of sustainable incident management.
Technical & Behavioral Competencies
Hands on experience with Red Hat OpenShift 3.x & 4.x (installation, upgrades, day to day admin) and strong grasp of Kubernetes primitives (Pods, Deployments, Services, Ingress, CRDs, Operators).
Strong Linux/Unix fundamentals and confidence diagnosing issues at the infrastructure level.
OpenShift SDN/OVN, Ingress/Egress, Service Mesh basics, Load Balancers, NetworkPolicy and CSI drivers, persistent volumes, snapshots, …
Ansible, Terraform, Helm, OpenShift Operator SDK, Bash/Python scripting.
Experience with Prometheus, Alertmanager, Grafana, Opentelemetry basics.

Similar Openings for You