AI Engineer

ibm

Bangalore 3 Years Exp Posted 1h ago

Enable and optimize LLMs for training and inference on IBM Z, GPUs, and AI accelerators
Drive performance improvements (latency, throughput, memory efficiency) for production workloads
Implement LLM optimizations such as KV cache management, efficient attention, and optimized execution strategies
Evaluate and validate LLMs at model-level and ops-level to ensure functional correctness, numerical accuracy, and model quality
Evaluate LLMs using quality and benchmarking frameworks (RAGAS, DeepEval, etc.)
Analyze and optimize tensor shapes, strides, and memory layouts to ensure efficient and correct execution across PyTorch and accelerator backends
Build and scale distributed training and inference systems across multi-GPU and multi-node environments
Develop high-performance kernels (CUDA/Triton) for compute-intensive workloads such as attention and quantization
Profile and debug performance using PyTorch Profiler, TensorBoard, and system-level tools, focusing on compute, memory, and communication bottlenecks
Build and maintain scalable infrastructure (Docker, Kubernetes) for reproducible and stable deployments
Collaborate with compiler and backend teams, contribute to PyTorch ecosystem (TorchDynamo, TorchInductor)

Required education

Bachelor's Degree

Preferred education

Bachelor's Degree

Required technical and professional expertise

3+ years of experience in AI/ML systems, deep learning, or performance engineering
Strong programming skills in Python (must) and working knowledge of C++
Strong understanding of PyTorch internals (Autograd, ATen, Dispatcher) and exposure to compiler stack (TorchDynamo, TorchInductor, torch.compile)
Good understanding of LLM architectures (Transformers, attention variants, KV cache, and efficient attention techniques such as Flash Attention or Paged Attention)
Experience in model optimization and performance tuning (latency, throughput, memory)
Strong understanding of tensor operations (shapes, strides, memory layouts) and their impact on execution
Experience with distributed training/inference frameworks (FSDP, DeepSpeed, or similar)
Familiarity with multi-GPU / multi-node environments and parallel execution
Experience in profiling and debugging using tools like PyTorch Profiler, TensorBoard, or similar
Good understanding of LLM evaluation and validation (performance and quality metrics)
Experience with Linux environments and containerization (Docker)
- Strong problem-solving skills with ability to debug complex system-level and model-level issues