DevOps / SRE Lead

ripplehire

New Delhi 3 Years Exp Posted 32d ago

Job Description

 

Key Responsibilities

•      Design, build, and operate cloud-native infrastructure on Azure and on-premise data center using infrastructure-as-code principles (Ansible, Terraform).

•      Architect and manage Kubernetes (AKS / self-managed) clusters at scale; enforce GitOps workflows

•      Drive adoption of Platform Engineering practices - build internal developer platforms (IDPs) leveraging Backstage or equivalent to reduce cognitive load on dev teams.

•      Manage and optimise container image lifecycle, registries (ACR, ECR), and multi-environment deployment strategies (blue-green, canary, rolling).

•      Implement full-stack observability using the OpenTelemetry standard  -  metrics (Prometheus / Thanos), logs (Loki / EFK / OpenSearch), and traces (Jaeger, Tempo).

•      Build and maintain Grafana dashboards, runbooks, and SLI/SLO frameworks; drive error-budget culture with tech/product teams.

•      Lead incident response and continuous reliability improvement.

•      Embed security into every layer: network policies, RBAC, OPA/Gatekeeper policies in Kubernetes, image signing (Cosign/Notary).

•      Manage secrets hygiene, certificate lifecycle (cert-manager), and cloud IAM with least-privilege principles.

•      Ensure compliance alignment (SOC 2, PCI-DSS awareness) for production workloads.

•      Operate and optimise event-streaming infrastructure  -  Apache Kafka, NATS, or RabbitMQ.

•      Support database reliability for PostgreSQL, MSSQL, MongoDB, Redis; coordinate DBA activities for backups, failover, and performance.

•      Collaborate closely with application development, QA, product, and security teams to align DevOps strategy with business goals.

•      Manage vendor and tool evaluations, present infrastructure roadmaps to technical leadership.

Skills

 

•      Deep hands-on experience with Azure (AKS, App Service, Azure Monitor, Key Vault, Azure DNS).

•      Proficiency in Terraform (modules, remote state, workspaces) and Ansible for IaC.

•      Strong Linux systems administration; networking fundamentals (DNS, TLS, load balancers, WAF, CDN - Akamai).

•      Expert-level Kubernetes operations: Helm chart authoring, Kustomize, admission webhooks, HPA, cluster upgrades, multi-tenancy patterns.

•      Experience with service mesh (Istio, Linkerd) traffic shaping, and observability.

•      Container runtime security (Falco, Snyk) and registry management.

•      Proficiency in at least two of: Python, Go, Bash  -  for tooling, automation, and custom operators/controllers.

•      Experience writing Kubernetes operators or CRDs is a strong plus.

•      Production experience with: Prometheus + manager + Thanos/Cortex, Grafana, Loki, OpenTelemetry Collector, Jaeger or Tempo.

Similar Openings for You