Sr Software Engineer, Data & AI Platform

dolby

Bangalore NM Years Exp Posted 41d ago

Job Description

Key Responsibilities:

  • Design and build platform primitives—Python SDKs, platform APIs, and templates—that enable reproducible experiments, configuration-as-code workflows, model lineage, and artifact tracking, which enable seamless promotion from research to production.
  • Create developer tools to elevate development experience—CLIs, UI, dashboards, visualization layers—that simplify platform operation and multi-stage workflows.
  • Implement and scale distributed training systems (multi-node GPU workloads) on top of Kubernetes and cloud-based orchestration foundation.
  • Build large-scale evaluation frameworks for offline tests, shadow deployments, and A/B experimentation.
  • Implement model/dataset versioning, approvals, lineage tracking, retention, and compliance hooks.
  • Partner with AI/ML research, platform engineering/MLOps and infrastructure, and data engineering teams to generalize workflows into reusable frameworks.
  • Partner with platform engineering/MLOps and infrastructure to define observability stacks for metrics, drift indicators, performance regressions, training/inference health signals, production reliability (SLIs/SLOs), monitoring, and incident response.

What you need to succeed

Desired Background:

  • BS in Computer Science, Mathematics, Engineering, or equivalent technical field. Master’s preferred.
  • Proven track record building large-scale distributed systems and integrated data and AI/ML platforms (e.g., training, serving, workflow orchestration, data pipelines).
  • Expert-level proficiency in Python and one of Go/Java/C++ and building production-grade services/APIs/SDKs
  • Extensive hands-on experience with Kubernetes (EKS, GKE, self-hosted, etc) including autoscaling and job scheduling frameworks, GPU infrastructure, and AI/ML-related AWS/GCP managed services (VertexAI, SageMaker, etc).
  • Deep expertise with AI/ML ecosystem and tooling such as PyTorch, TensorFlow, Ray, experiment/feature/model stores (MLFlow, WnB, Feast, etc), Hugging Face
  • Proven ability to scale AI/ML workloads and pipelines—pipeline SDKs, feature/model CI/CD, automated evaluation, safe rollouts, monitoring
  • Strong developer-experience mindset: ability to translate research/engineering friction into elegant APIs, templates, and tools that reduce time-to-first-successful remote run and raise platform adoption.

Preferred Skill:

  • Previous experience with Databricks.
  • Knowledge of multimodal AI/ML (audio, video, text) data preparation, feature extraction, model development, training, and evaluation workflows.
  • Experience with LLM/foundation model sizing/estimation, training requirements, pipelines, and deployment.
  • Knowledge of LLM/foundation model sizing/estimation, training requirements, evaluation workflows and orchestration and deployment patterns.
    • Experience designing feature stores or embedding services tightly integrated with training pipelines.

Similar Openings for You