Senior Machine Learning Engineer

greenhouse

Bengaluru, India 6 Years Exp Posted 48d ago

Job Description

Why this Role Matters 

  • Accelerate the rollout of LLM-powered and agent-driven features across Tekion products. 
  • Enable agentic workflows that automate, reason, and interact on behalf of users and internal stakeholders. 
  • Operationalize secure, compliant, and explainable LLM and agentic services at scale. 
  • Convert Applied Sciences models into scalable, compliant, cost‑efficient production services. 
  • Standardize how models are trained, validated, deployed, and monitored across Tekion products. 
  • Power real-time, context-aware experiences by integrating batch/stream features, graph context, and online inference. 

What You’ll Do 

  • Turn Applied Sciences prototype models (tabular, NLP/LLM, recommendation, forecasting) into fast, reliable services with well-defined API contracts. 
  • Integrate with the LLM Gateway/MCP, prompt/config versioning. 
  • Build and orchestrate CI/CD pipelines. 
  • Review data science models; refactor and optimize code; containerize; deploy; version; and monitor for quality. 
  • Collaborate with data scientists, data engineers, product managers, and architects to design enterprise systems. 
  • Monitor, detect, and mitigate risks unique to LLMs and agentic systems. 
  • Implement prompt management: versioning, A/B testing, guardrails, and dynamic orchestration based on feedback and metrics. 
  • Design batch/stream pipelines (Airflow/Kubeflow, Spark/Flink, Kafka) and online features linked to our domain graph. 
  • Build inference microservices (REST/gRPC) with schema versioning, structured outputs, and stringent p95 latency targets. 
  • Manage the model/feature lifecycle: feature store strategy, model/agent registry, versioning, and lineage. 
  • Instrument deep observability: traces/logs/metrics, data/feature drift, model performance, safety signals, and cost tracking. 
  • Ensure real-time reliability: autoscaling, caching, circuit breakers, retries/fallbacks, and graceful degradation. 
  • Develop templates/SDKs/CLIs, sandbox datasets, and documentation that make shipping ML the default path. 

Desired Skills and Experience 

  • 6+ years in ML engineering/MLOps or backend/platform engineering with production ML. 
  • Experience with LLMs, retrieval systems, vector stores, and graph/knowledge stores. 
  • Strong software engineering fundamentals: Python plus one of Java/Go/Scala; API design; concurrency; testing. 
  • Hands-on with orchestration frameworks and libraries (LangChain, LlamaIndex, OpenAI Function Calling, AgentKit, etc.). 
  • Knowledge of agent architectures (reactive, planning, retrieval-augmented agents), and safe execution patterns. 
  • Pipelines and data: Airflow/Kubeflow or similar; Spark/Flink; Kafka/Kinesis; strong data quality practices. 
  • Microservices and runtime: Docker/Kubernetes, service meshes, REST/gRPC; performance and reliability engineering. 
  • Model ops: experiment tracking, registries (e.g., MLflow), feature stores, A/B and shadow testing, drift detection. 
  • Observability: OpenTelemetry/Prometheus/Grafana; debugging latency, tail behavior, and memory/CPU hotspots. 
  • Cloud: AWS preferred (IAM, ECS/EKS, S3, RDS/DynamoDB, Step Functions/Lambda), with cost optimization experience. 
  • Security/compliance: secrets management, RBAC/ABAC, PII handling, auditability. 

Preferred Mindset 

  • Product-oriented: You measure success by dealer and consumer outcomes, not just technical metrics. 
  • Reliability- and safety-first: You move fast with guardrails, rollbacks, and clear SLOs. 
  • Systems thinker: You design for multi-tenant scale, portability, and cost efficiency
    • Collaborative: You translate between Applied Sciences, Product, and the Data & AI Platform; you document and teach. 

Similar Openings for You