Senior GenAI Engineer

mlp

Bangalore 5 Years Exp Posted 17d ago

Data "AI-Readiness": Build pipelines to ingest and normalize complex documents (PDFs, Transcripts, Filings). You will implement advanced parsing logic to accurately extract tables, hierarchical headers, and embedded charts.
Enrichments and Knowledge Graph Construction: Move beyond flat vector search by building GraphRAG systems and advanced annotations such topics, keywords, sentiment, etc. You will extract entities (Companies, People, Metrics) and relationships from text to build a dynamic Knowledge Graph that captures the nuance of the financial markets and its temporal aspects.
Advanced RAG Orchestration: Implement state-of-the-art RAG techniques, including:
- Contextual Chunking: Semantic and agentic chunking strategies that preserve document context.
- Multi-Stage Retrieval: Hybrid search (Keyword + Vector) and re-ranking pipelines.
- Query Transformation: Implementing query expansion (Multi-query), decomposition, and rewriting to handle complex investment prompts.
Graph-Vector Hybrid Systems: Leverage Graph-traversal (Cypher/Gremlin) combined with vector similarity to provide holistic context to the LLM.
Evaluation & Observability: Build "RAG Evaluation" frameworks (e.g., Ragas, TruLens) to measure faithfulness, relevance, and hallucination rates in an investment-grade environment.

Required Technical Skills

Programming: Mastery of Python (for AI/ML workflows) and Java (for high-throughput backend services)
LLM Frameworks: Deep experience with LangChain, LlamaIndex, Haystack, etc
Graph Technologies: Proficiency in Graph Databases (e.g., Neo4j, AWS Neptune, etc) and GraphRAG implementation patterns.
Document Intelligence: Experience with OCR and parsing tools (e.g., Unstructured.io, LlamaParse, AWS Textract, or LayoutLM).
Vector Databases: Expertise in Pinecone, Milvus, Weaviate, Chroma, etc
- Pipeline Engineering: Experience building high-throughput data platforms (Kafka, Spark) to process millions of tokens in real-time.