ML Engineer ( Infrastructure & Optimisation)
cutshort
Job Description
What You'll Do
Model Deployment & Optimization
• Lead end-to-end deployments of large language models on AWS infrastructure for strategic
customers
• Design and implement training, fine-tuning, and inference pipelines using Amazon SageMaker AI
• Optimize model performance through GPU-level tuning, kernel optimization, and infrastructure
configuration
• Deploy models on diverse GPU architectures including NVIDIA and AWS custom silicon (Trainium,
Inferentia)
Infrastructure Architecture & Performance
• Architect scalable ML infrastructure using SageMaker AI Inference, HyperPod, and distributed
training frameworks
• Implement CUDA-level optimizations and custom kernels for improved model performance
• Design storage and networking architectures optimized for high-throughput ML workloads
• Troubleshoot and resolve complex performance bottlenecks at the GPU driver and kernel level
Customer Engagement & Technical Leadership
• Partner with AWS AI Specialist Solution Architects and customer ML teams to understand model
requirements and deployment constraints
• Provide technical guidance on model selection, fine-tuning strategies, and production best practices
• Conduct performance benchmarking and cost optimization analysis for ML workloads
• Share field insights with AWS product teams to influence infrastructure and service roadmaps
What We're Looking For
Core Qualifications
• Bachelor's degree in Computer Science, Engineering, or equivalent practical experience (Master's or
PhD preferred)
• 5+ years of experience in machine learning infrastructure, model deployment, or GPU computing
• Strong programming skills in Python and experience with ML frameworks (PyTorch, TensorFlow, JAX)• Deep understanding of LLM architectures, training methodologies, and inference optimization
Technical Expertise (High-Level Alignment)
• Hands-on experience training, fine-tuning, or deploying large language models in production
• Proficiency with GPU programming, CUDA, and kernel-level optimization techniques
• Experience with distributed training frameworks and multi-GPU/multi-node orchestration
• Strong knowledge of AWS core services: EC2 (GPU instances), S3, EFS, VPC, and networking
Preferred Experience
• Direct experience with Amazon SageMaker AI (Training, Inference, HyperPod) or equivalent ML
platforms
• Understanding of GPU architectures (NVIDIA A100, H100) and AWS custom silicon (Trainium,
Inferentia)
• Experience with model compression techniques (quantization, pruning, distillation)
• Knowledge of MLOps practices, model monitoring, and production ML system design
• Background in high-performance computing, distributed systems, or systems programming
Essential Attributes
• Ability to dive deep into technical problems and debug complex infrastructure issues
• Strong analytical skills with data-driven approach to optimization
• Excellent communication skills to explain complex technical concepts to diverse audiences
• Comfortable working in ambiguous, fast-paced environments with evolving requirements
• Ownership mindset with ability to drive projects from architecture to production