Lead AI/ML Engineer

spglobal

Gurugram 7 Years Exp Posted 1h ago

Design and build multi-agent workflows: Lead hands-on engineering of stateful agentic applications using agent orchestration frameworks capable of coordinating multiple autonomous components.
Agent-to-agent collaboration: Define and implement robust communication patterns that allow agents to delegate sub-tasks, negotiate execution paths, and coordinate outcomes in dynamic environments.
State, memory, and long-running execution: Engineer control flows for non-deterministic systems, including message passing, persistent memory, recoverability, and interruptible execution for long-running tasks.
Standardized tool interfaces: Establish universal interfaces between agents, enterprise data sources, and operational tools to ensure modularity, reusability, and consistent governance.
Model integration and runtime optimization: Build routing and fallback strategies across multiple model endpoints; optimize context management, latency, and inference cost while maintaining reliability.
Production deployment: Package and deploy workloads via containerization and cluster orchestration, using cloud-native services for scaling, isolation, and secure runtime operations.

Build agent-ready data pipelines: Develop and maintain high-throughput ingestion and transformation pipelines that convert raw operational signals into structured, machine-consumable context.
Real-time context injection: Ensure agents can access near-real-time operational data by designing efficient retrieval patterns and optimizing vector databases and associated retrieval architectures.
Cross-functional execution: Serve as the technical bridge between AI and data teams—translating agent needs into schemas, data contracts, SLAs, and pipeline specifications, while resolving bottlenecks hands-on.

LLMOps, tracing, and debugging: Implement end-to-end observability for agent execution, including reasoning traces, performance telemetry, cost monitoring, and production debugging workflows.
Safety and control frameworks: Design hybrid autonomy modes (human-in-the-loop through fully autonomous), including approval gates, policy enforcement, and “break-glass” controls for sensitive operations.
Evaluation and reliability standards: Establish rigorous testing strategies for stochastic systems; automate evaluation pipelines to measure accuracy, failure modes, drift, and regression risk prior to deployment.

Define the agentic architecture roadmap: Partner with product and engineering leadership to scope feasibility, set technical direction, and prioritize high-impact autonomous initiatives.
Mentorship and engineering standards: Set expectations for code quality, architectural patterns, and review processes; mentor engineers to level up agentic engineering practices.
- Innovation to production: Rapidly prototype emerging approaches (e.g., advanced retrieval strategies, graph-based reasoning patterns) and mature successful experiments into supported production capabilities.