Site Reliability Engineer

UPS

CHENNAI 5 Years Exp Posted 340d ago

Key Responsibilities:

Design, develop, and maintain reliable, scalable, and highly available systems on GCP.
Build and manage CI/CD pipelines, infrastructure as code (IaC), and monitoring solutions.
Proactively monitor and manage system performance, uptime, and capacity using observability tools.
Troubleshoot and resolve infrastructure and application-level issues in real-time.
Implement and maintain disaster recovery, failover mechanisms, and backup strategies.
Automate repetitive tasks and processes to improve efficiency and reduce toil.
Participate in on-call rotations, incident management, and root cause analysis (RCA).
Ensure compliance with security standards, privacy regulations, and governance policies.
Collaborate with cross-functional teams to support DevOps and SRE best practices.
Drive improvements in SLAs, SLOs, and error budgets through data-driven insights.

Required Qualifications:

5–8 years of relevant experience as an SRE, DevOps Engineer, or Cloud Infrastructure Engineer.
Strong hands-on experience with Google Cloud Platform (GCP) – Compute Engine, GKE, Cloud Functions, Cloud Storage, IAM, BigQuery, etc.
Proficiency in Infrastructure as Code tools like Terraform, Deployment Manager, or CloudFormation.
Experience with Kubernetes, Docker, and container orchestration.
Proficiency in scripting languages like Python, Shell, or Go.
Deep understanding of monitoring and logging tools such as Prometheus, Grafana, Stackdriver, or Datadog.
Knowledge of CI/CD tools such as Jenkins, GitLab CI, or Cloud Build.
Experience with incident response, postmortem analysis, and site reliability principles.
- Strong problem-solving and communication skills.