Lead DevOps Engineer
sproutsai
Job Description
Multi-Cloud & Multi-Tenant Infrastructure:
- Design and manage infrastructure across AWS and GCP, ensuring consistent networking, security, and deployment patterns across both clouds.
- Architect tenant-isolated environments with secure VPC networking — no public-facing IPs, private subnets, VPC peering, endpoints, and VPN connectivity.
- Build and operate production Kubernetes clusters to host containerized microservices at scale.
- Define the strategy for which workloads run where — cloud vs. on-premise — based on data sensitivity, latency, and compliance requirements.
CI,CD & Deployment Governance:
- Own and evolve a centralized, modular CI,CD pipeline built on GitHub Actions as the single path to production.
- Eliminate direct developer access to production environments; implement controlled deployment workflows using session-based access tools (e.g., AWS SSM Session Manager).
- Establish branch protection, image signing, environment promotion gates, and tenant-aware deployment strategies.
On-Premise Appliance Management:
- Oversee configuration management for client-site appliances using Chef for example in a client-server architecture.
- Drive the strategy to progressively centralize microservices into cloud-hosted infrastructure, minimizing the on-premise footprint.
- Define remote access procedures, failure runbooks, and contingency workflows for on-premise hardware.
Security & Compliance:
- Enforce infrastructure security best practices for a healthcare environment handling PHI and de-identified clinical data across tenant boundaries.
- Manage VPN-based access to private cloud networks and implement least-privilege IAM, secrets management, and policy-as-code across all environments.
- Ensure tenant data isolation at the network, storage, and compute layers.
Monitoring, Reliability & Backups:
- Build and maintain unified observability using Prometheus and Grafana across cloud and on-premise environments.
- Own the backup and disaster recovery strategy — container registries, automated snapshots, and cross-cloud resilience.
- Define and track SLOs for critical data pipelines and tenant-facing services.
Team & Process:
- Mentor junior DevOps,infrastructure engineers and collaborate closely with data engineering, AI, and IT teams.
- Recommend and help hire for supporting roles (e.g., IT support for on-premise hardware operations).
- Establish DevOps standards, documentation, and runbooks for the team.