Sr AI Engineer
target
Job Description
- Build production-grade AI/ML applications and services using Python, following software engineering best practices for clean code, testing, documentation, reliability, and maintainability.
- Design and develop scalable data and ML pipelines for batch and real-time processing using Kafka, distributed processing frameworks, and workflow orchestration tools.
- Implement end-to-end model training, evaluation, deployment, and inference workflows that can scale across large datasets and enterprise workloads.
- Build and deploy REST APIs, microservices, and event-driven integrations to expose AI/ML capabilities to downstream applications and business workflows.
- Work with SQL and NoSQL databases to store, retrieve, transform, and manage structured and unstructured data for AI/ML use cases.
- Support production deployment and lifecycle management through CI/CD, containerization, model versioning, automated validation, and release processes.
- Implement observability and reliability mechanisms, including logging, monitoring, alerting, error handling, and root-cause analysis for production AI systems.
- Optimize model services for latency, throughput, cost, scalability, and operational performance.
- Collaborate with Data Scientists to convert prototypes, notebooks, and experimental models into reliable, maintainable, and scalable production solutions.
- Evaluate and integrate GenAI / LLM components, including prompt workflows, RAG pipelines, evaluators, guardrails, and orchestration patterns where applicable.
About You:
- Bachelor’s degree in Computer Science or equivalent experience, with 4+ years in software design, development, and algorithmic solutions.
Must-Have Skills:
- Proven experience building and deploying end-to-end AI/ML pipelines, including:
- Data preparation and feature engineering
- Model training and evaluation
- Model deployment and inferencing
- Production monitoring and lifecycle management
- Strong hands-on experience with MLOps practices and tools, including CI/CD for ML, model versioning, automated retraining, and production deployment.
Preferred / Good-to-Have Skills:
- Experience building applications using Generative AI (LLMs), including prompt engineering, RAG architectures, evaluation frameworks, and model orchestration.
- Exposure to Agentic AI systems, including multi-agent workflows, planning, tool usage, orchestration frameworks, and autonomous decision-making patterns.
- Experience implementing LLM observability, evaluation, and guardrails for production GenAI systems.
- Experience building reusable AI platforms or shared ML services used across multiple teams.
- Experience designing and operating scalable inference systems capable of supporting production workloads.
- Good understanding of observability and reliability for ML systems, including monitoring, alerting, performance tracking, debugging, and root-cause analysis.
- Strong software engineering fundamentals, including Python development, testing, code reviews, error handling, and production-quality coding practices.
- Experience working with cloud-based ML platforms and modern ML frameworks.