Senior DevOps/MLOps Engineer
citi
Job Description
Key Responsibilities:
- Design, implement, and maintain robust CI/CD pipelines for ML and software projects
- Support the full SDLC (Software Development Life Cycle), ensuring smooth integration, testing, deployment, and monitoring
- Build and manage ML model deployment pipelines, including containerization, versioning, rollback, and orchestration
- Automate testing, quality assurance, and performance checks for Python-based machine learning code
- Develop and maintain infrastructure-as-code solutions for repeatable and consistent environments
- Implement observability best practices, including monitoring, alerting, logging, and metrics
- Handle secrets management and enforce security practices in all DevOps processes
- Collaborate with cross-functional teams to translate business requirements into operational systems
- Identify and troubleshoot infrastructure and deployment issues, providing scalable solutions
- Document architectures, processes, and configurations clearly and concisely
Qualifications:
Must-Have Technical Skills:
- Strong experience with general DevOps tooling and practices
- Proficient in Python, with experience in testing frameworks (e.g., pytest)
- Deep knowledge of CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI, etc.)
- Familiarity with SDLC processes, change control, and release management
- Hands-on experience with ML pipeline orchestration tools (e.g., MLflow, Airflow, Kubeflow)
- Experience with Lightspeed for scalable ML workflows
- Proficient with Helm for Kubernetes application packaging and deployment
- Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK)
- Solid understanding of secrets management (e.g., HashiCorp Vault, AWS Secrets Manager, CyberArk)
- Strong SQL skills for data validation and diagnostics
- Proficient with Git, shell scripting, and Linux environments
- Familiarity with containerization and orchestration (e.g., Docker, Kubernetes)
Soft Skills & Business Acumen:
- Strong problem-solving and debugging skills
- High adaptability and comfort working in ambiguity
- Ability to translate loosely defined business needs into technical solutions
- Excellent communication skills for both technical and non-technical stakeholders
- A collaborative mindset and proactive attitude toward improvement
Nice to Have:
- Experience deploying ML models in production at scale
- Familiarity with cloud platforms (AWS, GCP, or Azure)
- Exposure to data governance, access control, and compliance in ML workflows
- Understanding of feature stores and model registries