SENIOR SCIENTIST - Machine Learning

happiestminds

Bangalore, 5 Years Exp Posted 69d ago

Develop and maintain statistical/ML modules (DID, Synthetic Control, A/B Testing, Multi-Treatment Effects) in Python
Build and extend FastAPI services and integrate them with our web application via SDK wrappers
Design and optimize large-scale data pipelines using PySpark, Delta Lake, and Azure Data Lake
Profile and resolve OOM issues in PySpark jobs ? optimize memory allocation, partitioning, broadcast joins, caching strategies, and Spark configurations
Deploy and manage workloads on Databricks, including job clusters, notebooks, and Delta Lake tables
Containerize and deploy services using Docker, Kubernetes, and CI/CD pipelines
Ensure code quality and security via SonarCloud, Snyk, and pytest
Collaborate with data scientists and product teams to translate research into production-ready modules

Must-Have Skills

Python (3.9+) ? 3+ years of production experience
PySpark & Spark Internals ? strong experience with Spark memory model, executor tuning, shuffle optimization, and diagnosing/resolving OOM errors (broadcast thresholds, partition skew, spill-to-disk, GC tuning)
Databricks ? hands-on with job orchestration, cluster configuration, notebook workflows, and Delta Lake optimization (Z-ordering, compaction, caching)
Causal Inference & Experimentation ? DID, synthetic control, A/B testing, hypothesis testing, panel data methods
Statistics/ML Libraries ? statsmodels, scikit-learn, scipy, pandas, numpy
API Development ? building RESTful services with FastAPI (or similar)
Cloud (Azure) ? Azure Storage, Azure ML, Data Lake
Docker & Kubernetes ? containerization and orchestration for ML workloads
Testing ? writing robust unit/integration tests with pytest

Nice-to-Have

What Sets You Apart

You can look at a Spark execution plan and pinpoint why a job is OOM-ing
You think in modules ? clean separation of data processing, inference, and post-processing
You can go from a Jupyter notebook prototype to a production-grade, testable service
You're comfortable with both statistical rigor and software engineering best practices