Senior AI Platform Engineer - LLM/RAG
hirist
Job Description
Design, build, and maintain AI agent workflows and orchestration patterns using frameworks like LangGraph.
- Build and optimize production-grade RAG systems chunking strategies, retrieval pipelines, embedding generation, and response quality.
- Implement and manage LLM API integrations with proper retry logic, fallbacks, rate limiting, error handling, and cost optimization.
- Develop prompt engineering solutions including system prompts, few-shot patterns, structured outputs (JSON mode), and prompt versioning.
- Build input validation, output filtering, and content safety layers for production AI systems.
- Implement AI observability and evaluation tracing, automated quality checks, regression testing for AI outputs.
- Build data ingestion pipelines for AI systems (document processing, embedding generation, vector storage).
- Collaborate with architects and product teams to translate AI capabilities into reliable platform features.
- Write well-tested, production-ready code with unit tests, integration tests, and AI-specific tests.
- Contribute to design docs, runbooks, and technical documentation for AI systems.
Requirements :
Must Have :
- 5+ years of professional software development experience.
- Hands-on experience building AI/ML features or LLM-powered systems in production.
- Strong understanding of LLM fundamentals tokens, context windows, embeddings, temperature, and similarity search.
- Experience with RAG systems indexing strategies, retrieval methods, chunking, and when RAG is the right approach.
- Practical prompt engineering skills chain-of-thought, few-shot, structured outputs, and systematic iteration.
- Experience with at least one AI orchestration framework (LangGraph, LangChain, or equivalent).
- Working experience with at least one vector database (pgvector, Pinecone, Qdrant, Weaviate, or similar).
- Proficiency in Python for AI development and prototyping.
- Experience with LLM APIs (Claude SDK, OpenAI SDK, or similar) in production.
- Understanding of AI safety basics prompt injection, jailbreaking, and practical mitigation approaches.
- Working knowledge of cloud platforms (AWS preferred) and containerization (Docker, Kubernetes).
- Strong problem-solving skills and ability to break down ambiguous AI problems into implementable tasks.
Nice to Have :
- Experience with AI observability tools (LangSmith, LangFuse) for tracing and debugging.
- Experience with MCP (Model Context Protocol) or equivalent tool integration patterns.
- Familiarity with fine-tuning concepts and when to consider fine-tuning vs. prompt-based solutions.
- Programming experience in Java
- Experience with event-driven architectures and message queues (Kafka, RabbitMQ).
- Knowledge of the telecom domain (billing, CRM, customer lifecycle).
- Experience shipping AI features in a customer-facing product.
- Backend experience in Go, Node.js, or Java microservices.