Infrrd.ai - Senior Data Scientist - LLM/Artificial Intelligence

hirist

Bangalore 8 Years Exp Posted 43d ago

Job Description

Job Duties and Responsibilities :
 

 

- Design and build agentic evaluation pipelines : Error detection root cause hypothesis generation prompt variant testing A/B measurement production promotion, with minimal human intervention.

 

- Own the accuracy measurement infrastructure : Automate error analysis, data quality pipelines, and batch evaluation frameworks across document types and customer configurations.

 

- Build and evolve internal accuracy tooling from manual utilities into automated improvement platforms - classification and extraction correction loops, NTP rule generation, performance reporting.

 

- Take prototype methodologies and productionize them into reliable, scalable systems the team can operate independently.

 

- Build LLM-based extraction and classification pipelines using few-shot and RAG strategies for complex, real-world document types.

 

- Design and maintain A/B testing infrastructure for prompt and model changes - no untested changes go to production.

 

- Create live dashboards tracking extraction accuracy, NTP rates, and false positive rates across document types and customer configurations.

 

- Optimize LLM costs while maintaining quality : prompt compression, output token minimization, model selection and migration strategies.

 

- Write production-grade data pipelines with error handling, retries, logging, and monitoring.

 

- Collaborate with platform engineering and applied research functions on architecture and methodology translation.

 

- Mentor 1 - 2 junior engineers; build tooling and documentation they can operate independently.

 

Required Qualifications :

 

- BE / MTech in Computer Science, AI/ML, Computational Data Science (CDS), Computer Science & Automation (CSA), or related discipline.

 

Experience Range :

 

- 8 - 10 years total; minimum 4 - 6 years building production LLM or AI systems; minimum 4-6 years in evaluation, quality measurement, or accuracy improvement work.

 

"Must-have" Skills :

 

- Production-grade Python - clean, tested, maintainable systems; not just scripts (pytest, FastAPI or Flask)

 

- Hands-on LLM API experience (OpenAI, Anthropic, Gemini, AWS Bedrock or equivalent) with

systematic, measurement-driven prompt engineering - methodology over instinct

 

- Agentic pipeline design - multi-step reasoning, tool use, orchestration frameworks (LangChain, LlamaIndex or equivalent), automated evaluation and feedback loops

 

- Evaluation framework design for LLM systems - precision/recall/F1, confusion matrices, A/B testing, per-class error analysis

 

- Analytical depth sufficient to design meaningful accuracy metrics and interpret why a model fails on a specific document or field type

 

- MongoDB or equivalent NoSQL - queries, aggregations, indexing pandas / numpy for data processing and batch analysis

 

- Git, code reviews, CI/CD basics (GitHub Actions or Jenkins)

 

- Clear written communication - able to explain model behaviour and accuracy findings to non-technical stakeholders

 

"Would-be-nice" Skills :

 

- Document AI : PDF parsing, layout-aware extraction, OCR, structured form extraction

 

- RAG pipeline design and vector search (Pinecone, Weaviate, or similar)

 

- Classification systems with large label spaces (50+ classes)

 

- Async Python (asyncio, aiohttp) for pipeline throughput

 

- Embedding models and semantic similarity for document matching

 

- Prior experience working alongside a Research or Applied Science team as the engineering

counterpart

 

Working Knowledge (Tools) :

 

- Python, FastAPI / Flask, MongoDB, Git, GitHub Actions / Jenkins, LLM APIs (OpenAI / Anthropic / Gemini or equivalent), LangChain / LlamaIndex, Pandas / Numpy, Pytest, Docker

 

General Knowledge :

 

- NLP concepts, LLM prompt engineering patterns, REST APIs, RAG pipelines, vector databases, JSON data structures

Similar Openings for You