Lead Machine Learning Engineer

spglobal

Hyderabad 8 Years Exp Posted 44d ago

Job Description

Responsibilities and Impact:

  • LLM & Generative AI Engineering: Deploy and architect production-scale LLM systems spanning frontier models (GPT-4 class), open-source variants (such as LLaMA, Mistral, Gemma), RAG pipelines, and multi-modal AI systems incorporating text, code, images, and structured data.

  • Agentic AI Systems: Design and operationalize autonomous AI agents with multi-agent orchestration, tool-use capabilities, memory management, and enterprise-grade guardrails and observability strategies.

  • Python & Software Engineering: Write high-performance Python code following SOLID principles, lead code reviews, build reusable AI libraries, and implement rigorous testing and CI/CD practices across all ML workloads

  • Cloud & Distributed Systems: Architect cloud-native AI infrastructure with GPU cluster management, auto-scaling inference endpoints, vector databases, and cost-optimized distributed systems for high-throughput model serving, leveraging managed AI services (such as Bedrock, Azure OpenAI, Vertex AI) alongside self-hosted deployments (such as vLLM, TGI).

  • Backend APIs & Systems Integration: Design production-grade RESTful and asynchronous APIs (similar to FastAPI, gRPC) exposing AI capabilities, integrate LLM services with enterprise systems, and own end-to-end performance, reliability, and security from design through production 

  • MLOps & LLMOps: Implement comprehensive ML pipelines for training through monitoring tools (similar to MLflow, Kubeflow, SageMaker), establish prompt versioning and model governance practices, and instrument production systems with observability across performance and quality metrics

  • DevOps & Platform Engineering: Embed AI workloads into CI/CD pipelines, champion containerization (such as Docker, Helm) and GitOps workflows, define SRE practices for ML systems, and drive platform standardization for self-service AI capabilities 

 

 

Basic Required Qualifications: 

  • 8+ years of progressive experience, with 6+ years in data science, data analytics, machine learning engineering, or similar roles. 

  • Proven ability to translate complex technical concepts for non-technical audiences with clarity and impact.  

  • History of mentoring mid-level engineers, conducting effective technical interviews, and raising the organizational engineering bar. 

  • LLM Frameworks: Extensive knowledge and experience in tools similar to LangChain, LlamaIndex, LangGraph, Hugging Face Transformers, PEFT, vLLM, Ollama, or equivalent production-grade tooling. 

  • MLOps Tooling: Extensive knowledge and experience in tools similar to MLflow, SageMaker, Vertex AI, or Kubeflow — with a bias toward automation and repeatability. 

  • Cloud Platforms: Deep expertise in cloud platforms such as AWS, GCP, or Azure. 

  • Python: Expert-level proficiency including async programming, performance optimization, type systems, packaging, and internal library authorship. 

  • Databases & Storage: Vector databases (similar to Pinecone, OpenSearch, Chroma), relational (such as PostgreSQL), NoSQL (such as Redis, DynamoDB), and object storage. 

  • Containerization & Orchestration expertise in environments similar to Docker , Helm. 

  • Backend Development: Expertise in engineering in environments similar to FastAPI, REST design principles, async patterns, OAuth2/JWT, and API security best practices. 

  • Distributed Systems: Experience with message queues (similar to SQS), event streaming, microservices design patterns. 

 

 

Preferred Qualifications: 

  • MS in Computer Science, Machine Learning, Engineering, or a related quantitative field. 

  • Published open-source contributions in the environments such as LLM, GenAI, or NLP space. 

  • Experience operating in regulated industries (finance, healthcare, legal) with AI complian

Similar Openings for You