Lead AI/ML Engineer

spglobal

Gurugram 7 Years Exp Posted 1h ago

Job Description

1) Agentic Systems Architecture & Core Engineering

  • Design and build multi-agent workflows: Lead hands-on engineering of stateful agentic applications using agent orchestration frameworks capable of coordinating multiple autonomous components.
  • Agent-to-agent collaboration: Define and implement robust communication patterns that allow agents to delegate sub-tasks, negotiate execution paths, and coordinate outcomes in dynamic environments.
  • State, memory, and long-running execution: Engineer control flows for non-deterministic systems, including message passing, persistent memory, recoverability, and interruptible execution for long-running tasks.
  • Standardized tool interfaces: Establish universal interfaces between agents, enterprise data sources, and operational tools to ensure modularity, reusability, and consistent governance.
  • Model integration and runtime optimization: Build routing and fallback strategies across multiple model endpoints; optimize context management, latency, and inference cost while maintaining reliability.
  • Production deployment: Package and deploy workloads via containerization and cluster orchestration, using cloud-native services for scaling, isolation, and secure runtime operations.

 

2) Data Engineering & Operational Real-Time Integration

  • Build agent-ready data pipelines: Develop and maintain high-throughput ingestion and transformation pipelines that convert raw operational signals into structured, machine-consumable context.
  • Real-time context injection: Ensure agents can access near-real-time operational data by designing efficient retrieval patterns and optimizing vector databases and associated retrieval architectures.
  • Cross-functional execution: Serve as the technical bridge between AI and data teams—translating agent needs into schemas, data contracts, SLAs, and pipeline specifications, while resolving bottlenecks hands-on.

 

3) Observability, Governance & Human-in-the-Loop

  • LLMOps, tracing, and debugging: Implement end-to-end observability for agent execution, including reasoning traces, performance telemetry, cost monitoring, and production debugging workflows.
  • Safety and control frameworks: Design hybrid autonomy modes (human-in-the-loop through fully autonomous), including approval gates, policy enforcement, and “break-glass” controls for sensitive operations.
  • Evaluation and reliability standards: Establish rigorous testing strategies for stochastic systems; automate evaluation pipelines to measure accuracy, failure modes, drift, and regression risk prior to deployment.

 

4) Technical Leadership & Strategy

  • Define the agentic architecture roadmap: Partner with product and engineering leadership to scope feasibility, set technical direction, and prioritize high-impact autonomous initiatives.
  • Mentorship and engineering standards: Set expectations for code quality, architectural patterns, and review processes; mentor engineers to level up agentic engineering practices.
    • Innovation to production: Rapidly prototype emerging approaches (e.g., advanced retrieval strategies, graph-based reasoning patterns) and mature successful experiments into supported production capabilities.

Similar Openings for You