Machine Learning Ops Engineer
inovalon
Job Description
Key responsibilities
- Design, implement, and maintain CI/CD pipelines for ML models and data workflows using AWS-native services and infrastructure-as-code.
- Operationalize models built on SageMaker, Bedrock, and Snowflake Cortex, including feature pipelines, training, batch/real-time inference, and monitoring.
- Build and manage data pipelines and feature stores using services such as AWS Glue, Lambda, Step Functions, and Snowflake.
- Implement observability for ML systems (logging, metrics, tracing, drift/quality monitoring) and establish SLOs/SLAs for production ML services.
- Automate environment provisioning, configuration, and dependency management across dev, test, and production.
- Partner with security and compliance teams to ensure ML workloads meet healthcare, privacy, and regulatory standards (e.g., HIPAA).
- Collaborate with ML engineers and data scientists to productionize notebooks and prototypes into robust, maintainable services.
- Contribute to best practices, standards, and documentation for ML platform and operations across the organization.
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, Mathematics, or a related field.
- 4+ years of experience in software engineering, data engineering, or ML engineering with at least 2+ years focused on MLOps or ML platform work.
- Strong proficiency with Python and experience integrating ML libraries or frameworks (e.g., scikit-learn, TensorFlow, PyTorch) into production workflows.
- Hands-on expertise with AWS services relevant to MLOps: SageMaker, Bedrock, IAM, CloudWatch, ECR, ECS/EKS or Lambda, S3, Step Functions, and Glue.
- Experience with Snowflake (including Snowflake Cortex), SQL, and building secure, performant data pipelines into and out of Snowflake.
- Proficiency with CI/CD tools (e.g., GitHub Actions, GitLab CI, CodePipeline) and infrastructure-as-code (e.g., Terraform, CloudFormation, CDK).
- Familiarity with containerization and orchestration (Docker, Kubernetes) and event streaming tools (e.g., Kafka) is a plus.
- Knowledge of software engineering best practices, including testing, code reviews, version control, and design for reliability and scalability.
- Experience in regulated domains or with healthcare data standards and regulations is a plus (e.g., HIPAA, FHIR, HL7).
Soft skills and benefits
- Excellent problem-solving and analytical skills with a focus on reliability and automation.
- Strong communication and collaboration abilities, including working cross-functionally with engineering, data science, and product teams.
- Ability to work independently in a fast-paced environment
- Competitive salary and benefits package.
- Opportunity to work on impactful ML platforms that improve healthcare outcomes.
- Collaborative, innovative environment with professional development and growth opportunities.