Senior AI Engineer
instahyre
Job Description
Responsibilities:
- RAG pipeline ownership: Ideate, architect, build, and deploy end-to-end RAG systems from scratch through to production.
- Embedding systems: Select, evaluate, and fine-tune embedding models; manage vector stores; and optimize retrieval quality.
- Advanced chunking: Implement late chunking and other segmentation strategies to maximize context fidelity and retrieval precision.
- Multi-source data integration: Connect and ingest from diverse sources, including SQL/NoSQL databases, PDFs, web content, Confluence, SharePoint, and real-time APIs.
- Chatbot integration: Embed RAG and LLM components into conversational AI products using LangChain, LlamaIndex, or custom orchestration layers.
- Evaluation and quality: Own retrieval evaluation frameworks (RAGAS, triad evaluations) and iterate on pipelines based on precision, recall, and relevance metrics.
- Deployment and observability: Deploy and monitor LLM services on cloud infrastructure with robust logging, alerting, and MLOps practices.
- Collaboration: Partner with product and engineering teams to deliver low-latency, reliable AI experiences at scale.
Requirements:
- 4+ years of total engineering experience, with 2+ years in LLM or NLP engineering.
- Hands-on experience designing and deploying RAG systems end-to-end.
- Deep familiarity with embedding models (OpenAI Ada, Cohere, BGE, E5) and vector databases (Pinecone, Weaviate, Chroma, pgvector).
- Strong command of Python and LLM orchestration frameworks (LangChain, LlamaIndex, Haystack).
- Experience working with multiple data source types: structured, unstructured, and real-time.
- Practical knowledge of late chunking and other advanced retrieval strategies.
- Familiarity with cloud deployment (AWS / GCP / Azure) and containerization (Docker, Kubernetes).
- Strong problem-solving instincts and a bias for building things that work in production.
What You'll Own / Deliver:
- End-to-end RAG pipelines from data source to retrieval to response.
- Embedding infrastructure and vector store management.
- Integration of LLM components into live chatbot and AI products.
- Evaluation and continuous improvement of retrieval quality.
- Technical documentation and knowledge sharing across the team.
Good to Have:
- Experience with agentic frameworks (AutoGen, CrewAI, or custom agents).
- Exposure to graph-based RAG or knowledge graph integration.
- Open-source contributions in the AI/ML space.