Staff Machine Learning Engineer
wbd
Job Description
You will be part of a team focused on re-training, model hosting, cost optimization, and managing production workflows at scale.
-
Build and maintain pipelines for model fine-tuning and retraining, including LoRA-based workflows
-
Integrate and maintain vector search services and semantic similarity infrastructure
-
Design scalable model serving solutions for open-source and foundation models
-
Develop systems for experiment tracking, model versioning, and evaluation
-
Monitor production models for drift and performance degradation
-
Manage compute cost and resource optimization across distributed training jobs
-
Integrate Human-in-the-Loop (HITL) workflows and offline labeling into training pipelines
-
Support model deployment for varied model architectures, including Vision-Language Models, Convolutional Neural Nets, and Embedding Generation models
-
Stand up and maintain Feature Store and data versioning infrastructure
-
Architect and implement RAG pipelines for video metadata, summarization, and Q&A
-
Build evaluation frameworks to assess LLM performance, hallucination frequency, and structured response accuracy
What to Bring:
-
9+ years of experience in machine learning engineering, with end-to-end ML workflow expertise
-
Strong background in model retraining, fine-tuning, and evaluation techniques
-
Experience deploying and managing open-source model servers (e.g., Triton, TorchServe, Ray Serve)
-
Proficient in managing cost-effective distributed computing environments (e.g., Kubernetes, Ray, SageMaker)
-
Familiar with experiment tracking tools (e.g., MLflow, Weights & Biases) and model versioning strategies
-
Deep understanding of ML domains including NLP, RecSys, and reinforcement learning
-
Experience with real-time inference systems and streaming data pipelines is a plus
-
Familiarity with labeling tools, HITL workflows, and offline data curation strategies
-
Comfort working in Agile development environments and collaborating across global teams
-