Senior Machine Learning Engineer, Digital Products
biospace
Job Description
-
Design and implement scalable data collection, storage, and processing ML pipelines to support enterprise-wide data needs.
-
Implement and maintain data governance frameworks and data quality checks within data pipelines to ensure compliance and reliability.
-
Build and optimize data models and data pipelines to support self-service analytics and reporting tools such as Tableau, Looker, and Power BI.
-
Collaborate with data scientists to operationalize machine learning models by integrating them into production data pipelines and ensuring scalability and performance.
-
Develop and manage ETL/ELT workflows and orchestration using tools like Airflow or AWS Step Functions to ensure efficient data movement and transformation.
-
Implement CI/CD practices for ML models and data pipelines, including automated testing, containerization, and deployment
Who USP is Looking For?
The successful candidate will have a demonstrated understanding of our mission, commitment to excellence through inclusive and equitable behaviors and practices, ability to quickly build credibility with stakeholders, along with the following competencies and experience:
Education
Bachelor’s degree in relevant field (e.g. Engineering, Analytics or Data Science, Computer Science, Statistics) or equivalent experience.
Experience
-
3+ years of experience designing, building, and optimizing large-scale data platforms and pipelines for structured, semi-structured, and unstructured data.
-
Expert in ETL/ELT workflows, data ingestion (streaming, batch, APIs), and distributed processing using Apache Spark, PySpark, Airflow, Glue, and modern orchestration frameworks.
-
Strong experience architecting and integrating data across heterogeneous systems, including data lakes and warehouses (AWS S3, Redshift, Snowflake, Delta Lake).
-
Deep knowledge of data quality frameworks, data governance, metadata management, and SQL optimization for analytical workloads.
-
Advanced Python/PySpark skills with hands-on experience in data processing and API development (FastAPI, Flask, Django).
-
Deep expertise in AWS services including S3, RDS, Redshift, Lambda, Step Functions, SageMaker, EC2/ECR, CloudWatch, ALB/NLB, and autoscaling.
-
Strong foundation in ML system design, feature stores, model registries, A/B testing, and deploying ML models with high availability and autoscaling.
-
Knowledge of GenAI/LLM workloads including RAG pipelines, vector stores, chunking, prompt engineering, and embeddings, is a plus
-
Skilled in containerization and orchestration using Docker and Kubernetes for scalable data and ML deployments.
-
Experience with CI/CD pipelines, infrastructure-as-code (Terraform, CloudFormation), and automated deployments.
-
Strong collaboration skills with cross-functional teams including product, architecture, engineering, and business stakeholders.
Additional Desired Preferences
-
Experience with scientific chemistry nomenclature or prior work experience in life sciences, chemistry, or hard sciences or degree in sciences
-
Experience with pharmaceutical datasets and nomenclature
-
Experience in developing Machine Learning & Deep Learning models
-
Ability to explain complex technical issues to a non-technical audience
-
Strong communication skills required: Verbal, written, and interpersonal
-