Senior Machine Learning Engineer

expediagroup

Bangalore 8 Years Exp Posted 33d ago

In this role, you will:

Design and own high-throughput, low-latency ML systems (2000+ RPS) for TravelAds, including multi-service training and serving architectures, auction and ranking models, and real-time inference services that meet strict sub-100ms SLAs.
Build and evolve ML infrastructure and data foundations – feature stores, online/offline feature pipelines, embedding and vector services, and data lineage and versioning – that power ad relevance, bidding optimization, experimentation, and model evaluation at scale.
Accelerate the end-to-end ML lifecycle by automating training, validation, deployment, shadow testing, A/B testing, and retraining using orchestrated workflows (e.g., Flyte, Airflow) and robust quality gates.
Develop agentic AI and LLM/RAG-powered workflows that automate ML operations (training, deployment, validation, monitoring, calibration) and enable AI-assisted dataset creation, operational analysis, and decision support.
Define and implement ML observability, reliability, and cost guardrails through drift and feature-freshness monitoring, health dashboards, SLO/SLI definitions, incident response, and resilience-focused improvements.
Safely integrates and operates AI/ML-enabled solutions that improve outcomes, while setting technical direction, mentoring MLEs to operate independently, and leading cross-team initiatives that elevate ML engineering practices and business impact.

Minimum Qualifications:

Bachelor’s degree in Computer Science or a related technical field; or Equivalent related professional experience.
8+ years of relevant professional experience.
Proven track record of designing, building, and operating production ML or large-scale distributed systems, including system design (HLD/LLD), serving stacks, monitoring and observability, rollbacks, and operational rigor.
Strong software engineering foundation in Python and at least one of Java/Kotlin/Scala, with deep understanding of distributed systems, data structures, and performance optimization.
Experience leading technical design for multi-quarter ML projects and partnering with Product and business stakeholders to define problems, make clear trade-offs, and measure the business impact of ML systems.

Preferred Qualifications:

Experience with real-time ML inference at high throughput (1000+ RPS or more) and strict latency SLAs.
Expertise with big data technologies such as Spark, Hive, Databricks and workflow orchestration tools such as Airflow and Flyte, as well as cloud-native ML platforms and infrastructure (e.g., AWS SageMaker, EKS, EMR, Docker).
Experience building ML lifecycle automation – CI/CD for ML, automated training pipelines, deployment orchestration, and robust data lineage and versioning – plus ML observability systems including drift detection, feature-freshness monitoring, model health dashboards, and offline/online parity validation.
Track record of leading incident response and root cause analysis for ML or other mission-critical services, and driving sustained improvements in reliability, resilience, and operational excellence.
- Familiarity with AI-driven systems, tools, or workflows and applying AI/ML concepts to improve real-world products and engineering outcomes, including experience with LLM productionization, RAG architectures, or agentic AI workflows in high-scale environments.