AI / ML Engineer
recrew
Job Description
- Embed with client product and engineering teams to architect and ship production-grade LLM-powered features end-to-end
- Build and optimize RAG pipelines with advanced chunking strategies, hybrid search, re-ranking, and vector database management (Pinecone, Milvus, Qdrant, or ChromaDB)
- Develop multi-agent systems and autonomous workflows with tool use, self-correction, and complex task execution using LangGraph, CrewAI, or equivalent agentic frameworks
- Fine-tune open-source LLMs (LLaMA, Mistral, or equivalent) using LoRA/QLoRA and implement 4-bit/8-bit quantization for cost-effective client deployment
- Set up and maintain production AI infrastructure including vLLM-based model serving, containerized deployments via Docker/Kubernetes, and continuous evaluation pipelines
- Implement AI safety and guardrail layers to mitigate hallucinations, enforce PII data protection, and monitor token usage and inference costs within client environments
- Transform raw, unstructured client data into high-value AI features in close collaboration with client-side Data Engineering and Product teams
Must Have Criteria
- 5+ years of Python engineering experience with production REST API development using FastAPI or Flask
- 2+ years of hands-on LLM application development using LangChain, LlamaIndex, or LangGraph shipped to production
- Demonstrated experience building and optimizing RAG pipelines including hybrid search, re-ranking, and vector DB management (Pinecone, Milvus, Qdrant, or ChromaDB)
- Hands-on experience fine-tuning open-source models (LLaMA, Mistral, or equivalent) using LoRA/QLoRA with Hugging Face Transformers and PyTorch
- Experience deploying and serving LLMs in production using Docker, Kubernetes, and vLLM or equivalent serving frameworks
- Working knowledge of AWS Bedrock or Google Vertex AI for managed model deployment and inference
- Experience with observability and evaluation tooling — LangSmith, Weights & Biases, or Arize Phoenix — in a live production AI context