CPW Data Engineer II
generalmills
Job Description
- Building data assets that is best suited and an edge above in the industry
- Data Governance & Stewardship : Maintain data quality , deploy right validation metrics to ensure precise and accurate data flow, enhancement of data quality and maintain data security across all users.
- Data Pipeline Development: Design and develop robust ETL/ELT pipelines using tools like Azure Databricks and PySpark. Build and maintain bronze, silver, and gold layer data architectures. Implement incremental processing, deduplication, and performance optimization
- Data Harmonization : Understand the current set-up and work with the team/business partners to optimize and automate the processes maintaining a good balance of speed vs perfection. Re-look at the existing data models and identify sustainable and scalable approaches for improving the data usage in analytics.
- Data Integration & Modeling: Integrate data from multiple sources (e.g., Nielsen, Circana, internal systems). Develop scalable data models for reporting and analytics use cases. Ensure consistency in dimensions like product, period, and market hierarchies. Optimize queries, storage formats (Delta), and pipeline performance
- Integrate New Data Assets : Build good understanding of new datasets, including panel data, RMS data and other data sources and set-up processes to integrate it into the existing data environment.
- Support Data Engineering needs : Understand data needs from the stakeholders & internal teams and implement data transformation/manipulation to help get answers to Business questions faster and smarter. Support the Reporting and Visualization team on backend data architecture and data model development for new projects.
- Performance & Optimization: Optimize queries, storage formats (Delta), and pipeline performance and ensure scalability and cost-efficiency of data workflows