Data Science

capgemini

Mumbai, India 6 Years Exp Posted 16d ago

Over 6 years of hands-on experience with Azure Data Services including Data Lake, Data Factory, Key Vault and Cognitive Search.

Proficient in Databricks ecosystem including cluster optimization, performance tuning with expertise in Delta Lake, PySpark and orchestrating workflows.
Experience with any relational SQL (SQL Server/Oracle) and NoSQL (MongoDB/DynamoDB) databases including Snowflake along with strong expertise in Python/PySpark for large‑scale data processing.
Experience with real-time systems like Event Hubs, Apache Kafka, Spark-Streaming, etc.
Experience with any Big Data frameworks like Spark/Kafka/ Hive/ Hadoop etc.
Strong programming skills in Python and SQL for data engineering and analytics.
Basic understanding of GenAI concepts including RAG and related AI/ML technologies and experience in generating embeddings for both structured and unstructured data sources would be preferred.
Familiarity with DevOps practices including CI/CD pipelines, automation strategies and experience in technologies like GitHub and Bitbucket will be good to have.

Working knowledge of BI tools (Tableau, Power BI) and data engineering platforms (Microsoft Fabric, Apache Storm, Apache NiFi) for reporting and pipeline setup will be beneficial.