Site Reliability Engineer

UPS

CHENNAI 5 Years Exp Posted 340d ago

Job Description

Key Responsibilities:

  • Design, develop, and maintain reliable, scalable, and highly available systems on GCP.
  • Build and manage CI/CD pipelines, infrastructure as code (IaC), and monitoring solutions.
  • Proactively monitor and manage system performance, uptime, and capacity using observability tools.
  • Troubleshoot and resolve infrastructure and application-level issues in real-time.
  • Implement and maintain disaster recoveryfailover mechanisms, and backup strategies.
  • Automate repetitive tasks and processes to improve efficiency and reduce toil.
  • Participate in on-call rotations, incident management, and root cause analysis (RCA).
  • Ensure compliance with security standards, privacy regulations, and governance policies.
  • Collaborate with cross-functional teams to support DevOps and SRE best practices.
  • Drive improvements in SLAs, SLOs, and error budgets through data-driven insights.

 

Required Qualifications:

  • 5–8 years of relevant experience as an SRE, DevOps Engineer, or Cloud Infrastructure Engineer.
  • Strong hands-on experience with Google Cloud Platform (GCP) – Compute Engine, GKE, Cloud Functions, Cloud Storage, IAM, BigQuery, etc.
  • Proficiency in Infrastructure as Code tools like TerraformDeployment Manager, or CloudFormation.
  • Experience with KubernetesDocker, and container orchestration.
  • Proficiency in scripting languages like PythonShell, or Go.
  • Deep understanding of monitoring and logging tools such as PrometheusGrafanaStackdriver, or Datadog.
  • Knowledge of CI/CD tools such as Jenkins, GitLab CI, or Cloud Build.
  • Experience with incident responsepostmortem analysis, and site reliability principles.
    • Strong problem-solving and communication skills.

Similar Openings for You