Senior GenAI Engineer
mlp
Job Description
-
Data "AI-Readiness": Build pipelines to ingest and normalize complex documents (PDFs, Transcripts, Filings). You will implement advanced parsing logic to accurately extract tables, hierarchical headers, and embedded charts.
-
Enrichments and Knowledge Graph Construction: Move beyond flat vector search by building GraphRAG systems and advanced annotations such topics, keywords, sentiment, etc. You will extract entities (Companies, People, Metrics) and relationships from text to build a dynamic Knowledge Graph that captures the nuance of the financial markets and its temporal aspects.
-
Advanced RAG Orchestration: Implement state-of-the-art RAG techniques, including:
-
Contextual Chunking: Semantic and agentic chunking strategies that preserve document context.
-
Multi-Stage Retrieval: Hybrid search (Keyword + Vector) and re-ranking pipelines.
-
Query Transformation: Implementing query expansion (Multi-query), decomposition, and rewriting to handle complex investment prompts.
-
-
Graph-Vector Hybrid Systems: Leverage Graph-traversal (Cypher/Gremlin) combined with vector similarity to provide holistic context to the LLM.
-
Evaluation & Observability: Build "RAG Evaluation" frameworks (e.g., Ragas, TruLens) to measure faithfulness, relevance, and hallucination rates in an investment-grade environment.
Required Technical Skills
-
Programming: Mastery of Python (for AI/ML workflows) and Java (for high-throughput backend services)
-
LLM Frameworks: Deep experience with LangChain, LlamaIndex, Haystack, etc
-
Graph Technologies: Proficiency in Graph Databases (e.g., Neo4j, AWS Neptune, etc) and GraphRAG implementation patterns.
-
Document Intelligence: Experience with OCR and parsing tools (e.g., Unstructured.io, LlamaParse, AWS Textract, or LayoutLM).
-
Vector Databases: Expertise in Pinecone, Milvus, Weaviate, Chroma, etc
-
Pipeline Engineering: Experience building high-throughput data platforms (Kafka, Spark) to process millions of tokens in real-time.
-