DevOps Engineer

hirist

kolkata 6 Years Exp Posted 1h ago

Job Description

- Own the reliability, availability, and performance of microservices and production workloads.

- Design and improve resilient infrastructure on GCP, with strong emphasis on Cloud Run, Kubernetes, and containerized services.

- Build and maintain observability across logs, metrics, tracing, alerting, and service health so issues are detected early and resolved quickly.

- Improve deployment safety through stronger CI/CD pipelines, release controls, rollback strategies, and environment consistency.

- Lead incident response and production readiness practices, including runbooks, postmortems, on-call hygiene, capacity planning, and resilience testing.

- Reduce operational toil by automating repetitive work and improving tooling for engineers supporting distributed services.

- Partner with development teams to improve the operability, scalability, and fault tolerance of microservices early in the design lifecycle.

- Strengthen cloud security and infrastructure hygiene across IAM, secrets management, workload hardening, and production safeguards.

- Improve service performance, resource efficiency, and cloud cost management without compromising reliability.

- Support architecture and reliability reviews for critical services and high-traffic business events.

Qualifications:

- 5+ years of experience in Site Reliability Engineering or closely related DevOps roles with meaningful production ownership.

- Strong experience running production systems on Google Cloud Platform.

- Hands-on experience with Cloud Run, Kubernetes, and container-based microservices in production.

- Strong experience with infrastructure as code, particularly Terraform and Terragrunt.

- Strong understanding of observability using tools such as OpenTelemetry, Cloud Monitoring, New Relic, or equivalent systems.

- Strong understanding of distributed systems, microservice failure modes, reliability engineering, and production debugging.

- Experience building or improving CI/CD pipelines and release workflows in modern engineering environments, including GitHub Actions.

- Ability to write code and automation in one or more languages such as Python or Java.

- Good judgment during incidents and a practical mindset around reliability, recovery, and risk tradeoffs.

- Strong written and verbal communication skills, with the ability to work effectively across engineering teams.

- Experience working with AI tooling and agentic workflows in engineering or operational environments.

- Experience in retail, e-commerce, or other customer-facing environments is a plus.

Similar Openings for You