Senior Machine Learning Engineer, Digital Products

biospace

Hyderabad 3 Years Exp Posted 41d ago

Job Description

  • Design and implement scalable data collection, storage, and processing ML pipelines to support enterprise-wide data needs.

  • Implement and maintain data governance frameworks and data quality checks within data pipelines to ensure compliance and reliability.

  • Build and optimize data models and data pipelines to support self-service analytics and reporting tools such as Tableau, Looker, and Power BI.

  • Collaborate with data scientists to operationalize machine learning models by integrating them into production data pipelines and ensuring scalability and performance.

  • Develop and manage ETL/ELT workflows and orchestration using tools like Airflow or AWS Step Functions to ensure efficient data movement and transformation.

  • Implement CI/CD practices for ML models and data pipelines, including automated testing, containerization, and deployment

Who USP is Looking For?

The successful candidate will have a demonstrated understanding of our mission, commitment to excellence through inclusive and equitable behaviors and practices, ability to quickly build credibility with stakeholders, along with the following competencies and experience:

Education

Bachelor’s degree in relevant field (e.g. Engineering, Analytics or Data Science, Computer Science, Statistics) or equivalent experience.

Experience

  • 3+ years of experience designing, building, and optimizing large-scale data platforms and pipelines for structured, semi-structured, and unstructured data.

  • Expert in ETL/ELT workflows, data ingestion (streaming, batch, APIs), and distributed processing using Apache Spark, PySpark, Airflow, Glue, and modern orchestration frameworks.

  • Strong experience architecting and integrating data across heterogeneous systems, including data lakes and warehouses (AWS S3, Redshift, Snowflake, Delta Lake).

  • Deep knowledge of data quality frameworks, data governance, metadata management, and SQL optimization for analytical workloads.

  • Advanced Python/PySpark skills with hands-on experience in data processing and API development (FastAPI, Flask, Django).

  • Deep expertise in AWS services including S3, RDS, Redshift, Lambda, Step Functions, SageMaker, EC2/ECR, CloudWatch, ALB/NLB, and autoscaling.

  • Strong foundation in ML system design, feature stores, model registries, A/B testing, and deploying ML models with high availability and autoscaling.

  • Knowledge of GenAI/LLM workloads including RAG pipelines, vector stores, chunking, prompt engineering, and embeddings, is a plus

  • Skilled in containerization and orchestration using Docker and Kubernetes for scalable data and ML deployments.

  • Experience with CI/CD pipelines, infrastructure-as-code (Terraform, CloudFormation), and automated deployments.

  • Strong collaboration skills with cross-functional teams including product, architecture, engineering, and business stakeholders.

Additional Desired Preferences

  • Experience with scientific chemistry nomenclature or prior work experience in life sciences, chemistry, or hard sciences or degree in sciences

  • Experience with pharmaceutical datasets and nomenclature

  • Experience in developing Machine Learning & Deep Learning models

  • Ability to explain complex technical issues to a non-technical audience

    • Strong communication skills required: Verbal, written, and interpersonal

Similar Openings for You