AI Engineer

ibm

Bangalore 3 Years Exp Posted 1h ago

Job Description

  • Enable and optimize LLMs for training and inference on IBM Z, GPUs, and AI accelerators
  • Drive performance improvements (latency, throughput, memory efficiency) for production workloads
  • Implement LLM optimizations such as KV cache management, efficient attention, and optimized execution strategies
  • Evaluate and validate LLMs at model-level and ops-level to ensure functional correctness, numerical accuracy, and model quality
  • Evaluate LLMs using quality and benchmarking frameworks (RAGAS, DeepEval, etc.)
  • Analyze and optimize tensor shapes, strides, and memory layouts to ensure efficient and correct execution across PyTorch and accelerator backends
  • Build and scale distributed training and inference systems across multi-GPU and multi-node environments
  • Develop high-performance kernels (CUDA/Triton) for compute-intensive workloads such as attention and quantization
  • Profile and debug performance using PyTorch Profiler, TensorBoard, and system-level tools, focusing on compute, memory, and communication bottlenecks
  • Build and maintain scalable infrastructure (Docker, Kubernetes) for reproducible and stable deployments
  • Collaborate with compiler and backend teams, contribute to PyTorch ecosystem (TorchDynamo, TorchInductor)

Required education

Bachelor's Degree

Preferred education

Bachelor's Degree

Required technical and professional expertise

  • 3+ years of experience in AI/ML systems, deep learning, or performance engineering
  • Strong programming skills in Python (must) and working knowledge of C++
  • Strong understanding of PyTorch internals (Autograd, ATen, Dispatcher) and exposure to compiler stack (TorchDynamo, TorchInductor, torch.compile)
  • Good understanding of LLM architectures (Transformers, attention variants, KV cache, and efficient attention techniques such as Flash Attention or Paged Attention)
  • Experience in model optimization and performance tuning (latency, throughput, memory)
  • Strong understanding of tensor operations (shapes, strides, memory layouts) and their impact on execution
  • Experience with distributed training/inference frameworks (FSDP, DeepSpeed, or similar)
  • Familiarity with multi-GPU / multi-node environments and parallel execution
  • Experience in profiling and debugging using tools like PyTorch Profiler, TensorBoard, or similar
  • Good understanding of LLM evaluation and validation (performance and quality metrics)
  • Experience with Linux environments and containerization (Docker)
    • Strong problem-solving skills with ability to debug complex system-level and model-level issues

Similar Openings for You