ML Engineer ( Infrastructure & Optimisation)

cutshort

Bengaluru, India 5 Years Exp Posted 46d ago

Job Description

What You'll Do

Model Deployment & Optimization

• Lead end-to-end deployments of large language models on AWS infrastructure for strategic

customers

• Design and implement training, fine-tuning, and inference pipelines using Amazon SageMaker AI

• Optimize model performance through GPU-level tuning, kernel optimization, and infrastructure

configuration

• Deploy models on diverse GPU architectures including NVIDIA and AWS custom silicon (Trainium,

Inferentia)

Infrastructure Architecture & Performance

• Architect scalable ML infrastructure using SageMaker AI Inference, HyperPod, and distributed

training frameworks

• Implement CUDA-level optimizations and custom kernels for improved model performance

• Design storage and networking architectures optimized for high-throughput ML workloads

• Troubleshoot and resolve complex performance bottlenecks at the GPU driver and kernel level

Customer Engagement & Technical Leadership

• Partner with AWS AI Specialist Solution Architects and customer ML teams to understand model

requirements and deployment constraints

• Provide technical guidance on model selection, fine-tuning strategies, and production best practices

• Conduct performance benchmarking and cost optimization analysis for ML workloads

• Share field insights with AWS product teams to influence infrastructure and service roadmaps

 

What We're Looking For

Core Qualifications

• Bachelor's degree in Computer Science, Engineering, or equivalent practical experience (Master's or

PhD preferred)

• 5+ years of experience in machine learning infrastructure, model deployment, or GPU computing

• Strong programming skills in Python and experience with ML frameworks (PyTorch, TensorFlow, JAX)• Deep understanding of LLM architectures, training methodologies, and inference optimization

Technical Expertise (High-Level Alignment)

• Hands-on experience training, fine-tuning, or deploying large language models in production

• Proficiency with GPU programming, CUDA, and kernel-level optimization techniques

• Experience with distributed training frameworks and multi-GPU/multi-node orchestration

• Strong knowledge of AWS core services: EC2 (GPU instances), S3, EFS, VPC, and networking

 

Preferred Experience

• Direct experience with Amazon SageMaker AI (Training, Inference, HyperPod) or equivalent ML

platforms

• Understanding of GPU architectures (NVIDIA A100, H100) and AWS custom silicon (Trainium,

Inferentia)

• Experience with model compression techniques (quantization, pruning, distillation)

• Knowledge of MLOps practices, model monitoring, and production ML system design

• Background in high-performance computing, distributed systems, or systems programming

Essential Attributes

• Ability to dive deep into technical problems and debug complex infrastructure issues

• Strong analytical skills with data-driven approach to optimization

• Excellent communication skills to explain complex technical concepts to diverse audiences

• Comfortable working in ambiguous, fast-paced environments with evolving requirements

• Ownership mindset with ability to drive projects from architecture to production

 

Similar Openings for You