Data Science
capgemini
Job Description
- Over 6 years of hands-on experience with Azure Data Services including Data Lake, Data Factory, Key Vault and Cognitive Search.
- Proficient in Databricks ecosystem including cluster optimization, performance tuning with expertise in Delta Lake, PySpark and orchestrating workflows.
- Experience with any relational SQL (SQL Server/Oracle) and NoSQL (MongoDB/DynamoDB) databases including Snowflake along with strong expertise in Python/PySpark for large‑scale data processing.
- Experience with real-time systems like Event Hubs, Apache Kafka, Spark-Streaming, etc.
- Experience with any Big Data frameworks like Spark/Kafka/ Hive/ Hadoop etc.
- Strong programming skills in Python and SQL for data engineering and analytics.
- Basic understanding of GenAI concepts including RAG and related AI/ML technologies and experience in generating embeddings for both structured and unstructured data sources would be preferred.
- Familiarity with DevOps practices including CI/CD pipelines, automation strategies and experience in technologies like GitHub and Bitbucket will be good to have.
- Working knowledge of BI tools (Tableau, Power BI) and data engineering platforms (Microsoft Fabric, Apache Storm, Apache NiFi) for reporting and pipeline setup will be beneficial.