Ekshvaku Tech Innovation - AWS DevOps/Platform Engineer
hirist
Job Description
- Own, maintain, and evolve a large library of Terraform modules that provision the entire AWS environment across development and production accounts
- Manage EKS cluster configurations including managed node groups and spot/fleet instance node groups (cost-optimized, achieving up to 70% savings vs on-demand)
- Provision and maintain supporting infrastructure : VPC, subnets, security groups, ALB, ACM certificates, Route53 DNS, SQS queues, SES email, and EFS volumes
- Add new modules for evolving infrastructure requirements and ensure all resources are reproducible and version-controlled
- Apply Terraform changes safely across environments using Terraform workspaces and remote state backends
Kubernetes & Container Orchestration :
- Operate and maintain the AWS EKS cluster with both spot/fleet and on-demand worker node groups
- Deploy and manage 16+ microservices on Kubernetes using Helm charts (4 custom charts : generic deployments, one-time jobs, cron jobs, and ingress)
- Configure and tune Horizontal Pod Autoscalers (HPA), Pod Disruption Budgets (PDB), and Persistent Volume Claims (PVC) per service
- Manage Kubernetes ingress, service accounts, RBAC, and ConfigMaps/Secrets
- Maintain the Helm chart repository (versioning, publishing, GitHub Actions pipeline)
- Debug pod failures, resource constraints, and node scheduling issues
CI/CD Pipeline Management :
- Own multiple GitHub Actions workflows covering PR validation, auto-deployment to dev, and production releases
- Enforce a two-part release flow: (1) PR checks (build, unit tests, commit linting, manual approvals) ? (2) auto-deploy on merge to development for dev environment; semver tag (vx.y.z) releases for production
- Maintain build pipelines for Go microservices (multi-stage Docker builds), Node.js services, and Helm charts
- Manage AWS ECR image repositories pushing, tagging, lifecycle policies
- Configure Slack notifications for deployment failures and pipeline events
- Build and improve deployment automation, reducing manual intervention in release processes
Monitoring & Observability :
- Operate SigNoz for APM configure service traces, metrics dashboards, and alerts across all microservices
- Manage CloudWatch log groups per service (integrated via Fluent Bit log shipping from Kubernetes)
- Maintain Grafana dashboards for infrastructure-level metrics
- Monitor Prometheus metrics exposed by backend services
- Maintain StatusPage.io public status pages for our services
- Define alerting rules and on-call runbooks; own incident response and post-mortems
Security & Secrets Management :
- Manage AWS Secrets Manager for all service credentials (MongoDB, Wasabi, application configs)
- Administer AWS Client VPN with SSO integration for secure developer access to private infrastructure
- Maintain IAM roles, policies, and service accounts following least-privilege principles
- Manage ACM certificates and ensure TLS is enforced across all ingress endpoints
- Operate ClamAV for malware scanning of user-uploaded files
- Support the SpiceDB fine-grained authorization service and its migration tooling
- Participate in compliance reviews and apply security best practices across the AWS account
Networking & Cloud Architecture :
- Manage multi-VPC architecture : separate VPCs for dev and production environments with VPC peering for controlled cross-environment access
- Configure MongoDB Atlas PrivateLink connectivity ensuring database clusters are accessible only from within the designated VPC
- Maintain bastion host configuration for emergency database access
- Design and implement network segmentation, security group rules, and NACLs
- Manage DNS via Route53 and ALB routing rules
Collaboration with Engineering Teams :
- Partner with Go and Node.js backend engineers to containerize new services and onboard them to the deployment pipeline
- Work with frontend engineers on AWS Amplify deployments for the Nuxt.js / Vue 3 PWA
- Provide runbooks and documentation for common debugging workflows (e.g., CloudWatch log tailing, VPN access, EKS pod debugging)
- Define and enforce infrastructure standards, naming conventions, and tagging strategies across environments
Our Stack You'