Lead Digital Engineer

darwinbox

Bangalore 8 Years Exp Posted 33d ago

Job Description

Roles & Responsibilities

 

EKS Infrastructure Ownership

 

•     Own end-to-end design, provisioning, and management of Amazon EKS clusters using Terraform

 

•     Define and maintain node group strategies including managed node groups, Fargate profiles, and spot/on-demand mix for cost optimization.

 

•     Manage EKS upgrades, control plane configurations, and Kubernetes version lifecycle.

 

•     Implement cluster autoscaler and Karpenter for dynamic workload scaling.

 

•     Design multi-environment (Dev/Staging/Prod) EKS architectures with strong environment isolation.

 

CI/CD Pipeline Engineering

 

•     Design, build, and maintain CI/CD pipelines using GitHub Actions or Jenkins for automated build, test, and deployment workflows.

 

•     Implement deployment strategies including blue-green, canary, and rolling deployments to ensure zero-downtime releases.

 

•     Integrate pipeline quality gates with security scanning (SAST/DAST), container image scanning, and policy compliance checks.

 

•     Develop automated rollback mechanisms and deployment validation frameworks.

 

•     Standardize pipeline templates and reusable workflow libraries across engineering teams.

 

Infrastructure as Code (IaC)

 

•     Author and maintain Terraform modules for all AWS infrastructure — VPCs, EKS, IAM, S3, ECR, RDS, and more.

 

•     Enforce IaC standards, module versioning, and Terraform state management using remote backends (S3 + DynamoDB).

 

•     Implement drift detection mechanisms to continuously validate live infrastructure against IaC definitions.

 

•     Manage Helm chart development and lifecycle for microservices deployments on EKS.

 

Security & Compliance

 

•     Design and enforce least-privilege IAM policies, IRSA (IAM Roles for Service Accounts), and service mesh security policies.

 

•     Manage secrets using AWS Secrets Manager and Parameter Store, integrated with Kubernetes workloads.

 

•     Implement network security using VPC security groups, NACLs, and Kubernetes Network Policies.

 

•     Drive infrastructure security compliance, vulnerability remediation, and audit readiness.

 

Observability & Incident Response

 

•     Build and maintain observability stacks using Prometheus, Grafana, and OpenTelemetry for metrics, logs, and distributed tracing.

 

•     Define SLIs, SLOs, and alerting thresholds for production Kubernetes workloads.

 

•     Lead incident response, root cause analysis (RCA), and post-mortem processes for infrastructure events.

 

•     Implement auto-remediation for common failure patterns to improve MTTR.

 

Cost Optimization & Capacity Planning

 

•     Continuously analyze AWS spend and implement right-sizing, reserved instance, and savings plan strategies.

 

•     Build cost attribution frameworks with tagging standards and chargeback models.

 

•     Forecast capacity requirements based on business growth and workload patterns.

 

Collaboration & Mentorship

 

•     Serve as the primary DevOps point of contact for product engineering teams, guiding infrastructure design decisions.

 

•     Mentor junior and mid-level DevOps engineers, establishing best practices and runbook documentation.

Similar Openings for You