AI / ML Engineer

recrew

Bengaluru, India 5 Years Exp Posted 1h ago

Embed with client product and engineering teams to architect and ship production-grade LLM-powered features end-to-end
Build and optimize RAG pipelines with advanced chunking strategies, hybrid search, re-ranking, and vector database management (Pinecone, Milvus, Qdrant, or ChromaDB)
Develop multi-agent systems and autonomous workflows with tool use, self-correction, and complex task execution using LangGraph, CrewAI, or equivalent agentic frameworks
Fine-tune open-source LLMs (LLaMA, Mistral, or equivalent) using LoRA/QLoRA and implement 4-bit/8-bit quantization for cost-effective client deployment
Set up and maintain production AI infrastructure including vLLM-based model serving, containerized deployments via Docker/Kubernetes, and continuous evaluation pipelines
Implement AI safety and guardrail layers to mitigate hallucinations, enforce PII data protection, and monitor token usage and inference costs within client environments
Transform raw, unstructured client data into high-value AI features in close collaboration with client-side Data Engineering and Product teams

Must Have Criteria

5+ years of Python engineering experience with production REST API development using FastAPI or Flask
2+ years of hands-on LLM application development using LangChain, LlamaIndex, or LangGraph shipped to production
Demonstrated experience building and optimizing RAG pipelines including hybrid search, re-ranking, and vector DB management (Pinecone, Milvus, Qdrant, or ChromaDB)
Hands-on experience fine-tuning open-source models (LLaMA, Mistral, or equivalent) using LoRA/QLoRA with Hugging Face Transformers and PyTorch
Experience deploying and serving LLMs in production using Docker, Kubernetes, and vLLM or equivalent serving frameworks
Working knowledge of AWS Bedrock or Google Vertex AI for managed model deployment and inference
- Experience with observability and evaluation tooling — LangSmith, Weights & Biases, or Arize Phoenix — in a live production AI context