Senior Software Engineer

topgeek

Bengaluru, India 5 Years Exp Posted 27d ago

Key Responsibilities

Architecture & System Design

Architect and deploy end-to-end AI systems — from data pipelines to model serving.
Design modular SDKs for multi-provider AI integration (OpenAI, Claude, Gemini, LLaMA).
Lead decision-making on cloud vs self-hosted LLM deployment (Ollama, vLLM, TGI).
Guide infrastructure design for scalability, observability, and cost efficiency using GPU clusters, Ray, or KServe.
Collaborate with backend, MLOps, and infra teams to ensure high availability and low latency across AI workloads.

Core ML / DL Development

Train and fine-tune models (CNN, RNN, Transformers) across text, vision, and speech domains.
Implement LoRA / PEFT fine-tuning for custom LLMs, embedding models, and instruction-tuned variants.
Work with open-source and proprietary model repositories (Hugging Face, Kaggle, Hugging Face Spaces).
Optimize model architectures for inference performance, quantization, and memory efficiency.
Conduct A/B testing, cross-validation, and human evaluation on model outputs.
Build internal evaluation benchmarks and dataset management pipelines for consistent model scoring and comparison.

Data & Dataset Engineering

Curate, clean, and version-control datasets for text, image, and audio modalities.
Build pipelines for data labelling, augmentation, and validation using Airflow / Prefect.
Create and manage feature stores, embedding repositories, and dataset registries.
Leverage open datasets (e.g., Common Crawl, LAION, OpenImages, LibriSpeech) and integrate custom enterprise datasets.
Ensure data governance, bias checks, and PII anonymization using Presidio or custom filters.

AI Ops & Deployment

Automate model workflows with MLflow, Kubeflow, or Vertex AI for experiment tracking and versioning.
Lead model deployment with vLLM, TGI, or TorchServe, ensuring optimized GPU/TPU utilization.
Set up continuous evaluation pipelines for model drift, bias, and quality decay using EvidentlyAI and Prometheus.
Leverage open datasets (e.g., Common Crawl, LAION, OpenImages, LibriSpeech) and integrate custom enterprise datasets.
Drive adoption of model registries and model cards for transparency and reproducibility.

Team & Technical Leadership

Mentor and review the work of AI/ML Engineers I & II.
Collaborate with product, design, and research teams to translate business needs into AI roadmaps.
Lead POCs and experiments for emerging AI verticals (e.g., multimodal, video, robotics, IoT intelligence).
Present internal demos, AI reports, and architectural documentation to leadership and clients

Core Skills Required

Programming: Expert-level Python, with a deep understanding of OOP, async, and design patterns
Frameworks: PyTorch, TensorFlow, Hugging Face Transformers, LangChain,LlamaIndex.
Model Ops: MLflow, KServe, TorchServe, vLLM, TGI.
Data Stack: Airflow / Prefect, pgvector, Milvus, Pinecone, FOSS, PostgreSQL.
Infra: Docker, Kubernetes, Ray, GPU servers, Cloud AI (Vertex AI, Bedrock, Azure).
Evaluation & Metrics: Familiarity with BLEU, ROUGE, and latency/throughput metrics for AI models.
Security: Secure Vaults, Microsoft Presidio, Fairlearn / AIF360 awareness for data and bias governance.

Good-to-Have Skills

Experience with distributed training, quantization, and mixed-precision optimization.
Experience with model compression, distillation, or low-rank adaptation for efficiency.
Contribution to open-source AI frameworks or Hugging Face Spaces.
Research exposure in LLM alignment, prompt optimization, or multimodal reasoning.
- Understanding of AI cost governance, observability, and MLOps automation.