Ekshvaku Tech Innovation - AWS DevOps/Platform Engineer

hirist

Bangalore, 5 Years Exp Posted 46d ago

Job Description

- Own, maintain, and evolve a large library of Terraform modules that provision the entire AWS environment across development and production accounts

- Manage EKS cluster configurations including managed node groups and spot/fleet instance node groups (cost-optimized, achieving up to 70% savings vs on-demand)

- Provision and maintain supporting infrastructure : VPC, subnets, security groups, ALB, ACM certificates, Route53 DNS, SQS queues, SES email, and EFS volumes

- Add new modules for evolving infrastructure requirements and ensure all resources are reproducible and version-controlled

- Apply Terraform changes safely across environments using Terraform workspaces and remote state backends

Kubernetes & Container Orchestration :

- Operate and maintain the AWS EKS cluster with both spot/fleet and on-demand worker node groups

- Deploy and manage 16+ microservices on Kubernetes using Helm charts (4 custom charts : generic deployments, one-time jobs, cron jobs, and ingress)

- Configure and tune Horizontal Pod Autoscalers (HPA), Pod Disruption Budgets (PDB), and Persistent Volume Claims (PVC) per service

- Manage Kubernetes ingress, service accounts, RBAC, and ConfigMaps/Secrets

- Maintain the Helm chart repository (versioning, publishing, GitHub Actions pipeline)

- Debug pod failures, resource constraints, and node scheduling issues

CI/CD Pipeline Management :

- Own multiple GitHub Actions workflows covering PR validation, auto-deployment to dev, and production releases

- Enforce a two-part release flow: (1) PR checks (build, unit tests, commit linting, manual approvals) ? (2) auto-deploy on merge to development for dev environment; semver tag (vx.y.z) releases for production

- Maintain build pipelines for Go microservices (multi-stage Docker builds), Node.js services, and Helm charts

- Manage AWS ECR image repositories pushing, tagging, lifecycle policies

- Configure Slack notifications for deployment failures and pipeline events

- Build and improve deployment automation, reducing manual intervention in release processes

Monitoring & Observability :

- Operate SigNoz for APM configure service traces, metrics dashboards, and alerts across all microservices

- Manage CloudWatch log groups per service (integrated via Fluent Bit log shipping from Kubernetes)

- Maintain Grafana dashboards for infrastructure-level metrics

- Monitor Prometheus metrics exposed by backend services

- Maintain StatusPage.io public status pages for our services

- Define alerting rules and on-call runbooks; own incident response and post-mortems

Security & Secrets Management :

- Manage AWS Secrets Manager for all service credentials (MongoDB, Wasabi, application configs)

- Administer AWS Client VPN with SSO integration for secure developer access to private infrastructure

- Maintain IAM roles, policies, and service accounts following least-privilege principles

- Manage ACM certificates and ensure TLS is enforced across all ingress endpoints

- Operate ClamAV for malware scanning of user-uploaded files

- Support the SpiceDB fine-grained authorization service and its migration tooling

- Participate in compliance reviews and apply security best practices across the AWS account

Networking & Cloud Architecture :

- Manage multi-VPC architecture : separate VPCs for dev and production environments with VPC peering for controlled cross-environment access

- Configure MongoDB Atlas PrivateLink connectivity ensuring database clusters are accessible only from within the designated VPC

- Maintain bastion host configuration for emergency database access

- Design and implement network segmentation, security group rules, and NACLs

- Manage DNS via Route53 and ALB routing rules

Collaboration with Engineering Teams :

- Partner with Go and Node.js backend engineers to containerize new services and onboard them to the deployment pipeline

- Work with frontend engineers on AWS Amplify deployments for the Nuxt.js / Vue 3 PWA

- Provide runbooks and documentation for common debugging workflows (e.g., CloudWatch log tailing, VPN access, EKS pod debugging)

- Define and enforce infrastructure standards, naming conventions, and tagging strategies across environments

Our Stack You'