Senior GenAI Engineer

mlp

Bangalore 5 Years Exp Posted 17d ago

Job Description

  • Data "AI-Readiness": Build pipelines to ingest and normalize complex documents (PDFs, Transcripts, Filings). You will implement advanced parsing logic to accurately extract tables, hierarchical headers, and embedded charts.

  • Enrichments and Knowledge Graph Construction: Move beyond flat vector search by building GraphRAG systems and advanced annotations such topics, keywords, sentiment, etc. You will extract entities (Companies, People, Metrics) and relationships from text to build a dynamic Knowledge Graph that captures the nuance of the financial markets and its temporal aspects.

  • Advanced RAG Orchestration: Implement state-of-the-art RAG techniques, including:

    • Contextual Chunking: Semantic and agentic chunking strategies that preserve document context.

    • Multi-Stage Retrieval: Hybrid search (Keyword + Vector) and re-ranking pipelines.

    • Query Transformation: Implementing query expansion (Multi-query), decomposition, and rewriting to handle complex investment prompts.

  • Graph-Vector Hybrid Systems: Leverage Graph-traversal (Cypher/Gremlin) combined with vector similarity to provide holistic context to the LLM.

  • Evaluation & Observability: Build "RAG Evaluation" frameworks (e.g., Ragas, TruLens) to measure faithfulness, relevance, and hallucination rates in an investment-grade environment.

 

Required Technical Skills

  • Programming: Mastery of Python (for AI/ML workflows) and Java (for high-throughput backend services)

  • LLM Frameworks: Deep experience with LangChain, LlamaIndex, Haystack, etc

  • Graph Technologies: Proficiency in Graph Databases (e.g., Neo4j, AWS Neptune, etc) and GraphRAG implementation patterns.

  • Document Intelligence: Experience with OCR and parsing tools (e.g., Unstructured.io, LlamaParse, AWS Textract, or LayoutLM).

  • Vector Databases: Expertise in Pinecone, Milvus, Weaviate, Chroma, etc

    • Pipeline Engineering: Experience building high-throughput data platforms (Kafka, Spark) to process millions of tokens in real-time.

Similar Openings for You