Lead AI/ML Engineer(EVP) - ODEX- May-26
ripplehire
Job Description
- Define and own the end-to-end architecture for generative AI systems across multiple use cases and teams
- Establish and enforce standards for RAG, agent architectures, prompt and version management, evaluation, observability, and deployment
- Decide when to build, buy, fine-tune, or replace models, tools, and frameworks based on technical and business constraints
- Design, evolve, and govern shared AI platforms, including reusable RAG pipelines, agent orchestration frameworks, prompt management systems, and evaluation/monitoring infrastructure
- Drive reuse and standardization, eliminating one-off AI solutions and reducing long-term technical debt
- Architect complex AI workflows, including multi-agent systems, tool orchestration, and long-running or asynchronous tasks
- Design AI systems resilient to hallucinations, noisy inputs, partial failures, and model degradation
- Optimize AI systems for latency, cost, reliability, scalability, and explainability at production scale
- Lead technical design reviews, act as a technical authority, and unblock complex architectural and implementation challenges
- Mentor and raise the technical bar for senior and junior engineers across the generative AI stack
- Define and enforce guardrails for data security, privacy, compliance, and responsible AI usage
- Proactively identify model risks, operational failure modes, and scaling bottlenecks
- Translate long-term business and product goals into concrete, extensible AI platform capabilities
- Design, build, and optimize retrieval-augmented generation (RAG) pipelines using vector databases (e.g., Qdrant, Pinecone, FAISS) to power semantic search and intelligent document workflows
- Fine-tune and adapt LLMs using Hugging Face Transformers, LoRA/PEFT, DeepSpeed, or Accelerate where domain adaptation is required
- Integrate knowledge graphs (e.g., Neo4j, AWS Neptune) into agent pipelines for enhanced context, reasoning, and relationship modeling
- Implement cache-augmented generation strategies (semantic caching, Redis, vector similarity) to reduce latency, cost, and output inconsistency
- Build and maintain scalable backend services using FastAPI or Flask and support lightweight user interfaces or prototypes using Streamlit, Gradio, or React when needed
- Monitor and evaluate model and agent performance using prompt testing, benchmarks, human-in-the-loop feedback, observability tools, and safe AI practices
- Stay current with advancements in cloud platforms (AWS/GCP/Azure), LLMs, agentic frameworks, and AI infrastructure, and incorporate improvements where appropriate
Prerequisites:
- Strong Python development skills, including API development and service integration
- Proven track record of designing and scaling AI systems used by real teams or clients
- Expert-level Python and strong software engineering fundamentals
- Deep, hands-on expertise with LLM APIs and open-source models, RAG architecture and vector search strategies, agent-based systems and tool calling and prompt engineering at scale
- Experience with model fine-tuning, adapters, or hybrid architecture
- Strong background in distributed systems and API design, Docker, CI/CD, and cloud infrastructure, and async workflows, queues, and background processing
- Experience implementing observability for AI systems (metrics, logs, tracing, cost monitoring)
Experience:
- 6–10 years of experience in AI/ML, with at least 2 years focused on large language models, applied NLP, or agent-based systems
- Demonstrated ability to build and ship real-world AI-powered applications or platforms, preferably involving agents or LLM-centric workflows
- Strong analytical, problem-solving, and communication skills
- Ability to work independently in a fast-moving, collaborative, and cross-functional environment
- Prior experience in startups, innovation labs, or consulting firms is a plus
- Experience with AI governance, model audits, and compliance frameworks is a plus.