Applied AI Engineer

ycombinator

Bengaluru, India 3 Years Exp Posted 31d ago

Job Description

What you'll do

  • Build and maintain the eval framework that scores voice agent quality end-to-end transcription, response quality, TTS, and full-conversation outcomes
  • Design voice agent behavior: system prompts, tool use, conversation flow, error recovery, and guardrails for real-time interactions
  • Drive transcription accuracy improvements across STT providers and configurations (Deepgram, Whisper, AssemblyAI, Nvidia, etc.)
  • Drive TTS quality improvements voice selection, latency vs. fidelity tradeoffs, prosody, edge cases
  • Curate and grow our evaluation datasets, including hard-case mining from production traffic
  • Run rigorous A/B experiments and report results that the team can actually act on
  • Partner with backend engineers to wire eval signals into CI so regressions get caught before they ship

Must-haves

  • ML engineering experience shipping production systems
  • Strong Python and a working ML stack (PyTorch, Huggingface, pandas, scikit-learn)
  • Hands-on experience designing LLM-based agents: prompting, tool/function calling, multi-turn state, structured outputs
  • Hands-on experience building evals or eval frameworks for ML, LLM, or voice systems. Built LLM-as-judge eval pipelines and know their failure modes
  • Practical experience with ASR/STT comparing providers, fine-tuning, or running open models like Whisper
  • Practical experience with TTS systems (ElevenLabs or open models)
  • Comfortable working with audio data: sample rates, codecs, noise, alignment

Nice-to-haves

  • Designed voice agents specifically handled barge-in, interruption recovery, disfluencies, and natural turn-taking at the prompt/behavior layer
  • Experience with diarization, VAD, or endpointing models
  • Audio dataset curation, labeling, or annotation pipelines
  • Trained or fine-tuned ASR or TTS models from scratch or on domain audio
  • Experience with active learning or data-flywheel patterns over production traffic
  • Open-source contributions to AI/ML frameworks
    • Familiarity with cost/latency tradeoffs across model providers for real-time voice

Similar Openings for You