Lead DevOps / Cloud Engineer — Multimodal Search Platform

zigya

New Delhi 6 Years Exp Posted 47d ago

Job Description

  • - 6–10 years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Platform Engineering roles within high-scale technology environments.

  • - Strong hands-on expertise with at least one major cloud platform — AWS, GCP, or Azure — including networking, compute, storage, and managed Kubernetes services.

  • - Deep experience operating production Kubernetes environments at scale, including autoscaling, cluster upgrades, workload orchestration, and resilience design.

  • - Proven experience implementing Infrastructure as Code using Terraform (preferred) or equivalent tooling.

  • - Strong understanding of distributed systems reliability, including load balancing, caching strategies, asynchronous queues, and failure recovery patterns.

  • - Experience designing and managing CI/CD pipelines using modern tooling (GitHub Actions, GitLab CI, ArgoCD, Jenkins, or equivalent).

  • - Hands-on experience building observability stacks using tools such as Prometheus, Grafana, ELK/OpenSearch, Datadog, or OpenTelemetry.

  • - Experience supporting GPU workloads and AI inference systems, including containerized model deployment and performance optimization for production ML systems.

  • - Familiarity with AI model serving frameworks such as Triton Inference Server, vLLM, TGI, or similar platforms is strongly preferred.

  • - Strong scripting and automation skills (Python, Bash, or Go preferred).

  • - Solid understanding of networking, security best practices, secrets management, and cloud cost optimization strategies.

  • - Experience working in fast-moving startup or scale-up environments with high ownership expectations.

 

Similar Openings for You