Principal Product AI Data Platform Engineer
clarivate
Job Description
- 10+ years of professional experience in Data Engineering, Analytics Engineering, or Data Architecture.
- Proven experience in enterprise-scale data architecture and distributed pipeline design.
- Expert-level proficiency in SQL and relational database design.
- Strong hands-on experience in Python for pipeline automation, orchestration, and data framework development.
- Deep expertise in dimensional modeling, including star and snowflake schemas, fact/dimension tables, SCDs, surrogate keys, and hierarchical dimensions.
- Experience designing and operating production-grade ETL/ELT pipelines for analytics and AI/ML workloads.
- Strong ability to influence technical outcomes through architectural leadership and enterprise strategy.
It would be great if you also
- Experience with cloud data warehouses: Snowflake, Databricks, BigQuery.
- Familiarity with modern data orchestration and transformation tools: dbt, Airflow, Fivetran, Segment.
- Experience handling semi-structured and event-driven data (JSON, logs, clickstream).
- Exposure to BI and visualization tools: Power BI, Tableau, Looker, SAP BusinessObjects.
- Experience with AWS, Azure, or GCP, including data governance, security, and compliance frameworks.
- Background in Life Sciences or Healthcare analytics will be a big plus
What will you be doing in this role
- Define and evolve enterprise-level product data architecture across multiple product lines, ensuring scalability, reliability, and AI/ML readiness.
- Architect scalable ETL/ELT pipelines and distributed data workflows for analytics, AI, and product intelligence.
- Develop and enforce dimensional data modeling standards (star schemas, snowflake schemas) across the organization.
- Design and maintain fact and dimension tables, ensuring proper grain, SCD handling, hierarchical dimensions, and high-performance queries.
- Establish data architecture principles, naming conventions, and best practices for ETL/ELT, event tracking, and AI pipelines.
- Serve as the technical authority guiding architecture decisions to meet product, platform, and AI requirements.
AI & Product Data Enablement
- Partner with cross-functional teams to translate requirements into highly scalable, analytics- and AI-ready data models and pipelines.
- Curate and validate datasets for machine learning, experimentation, and advanced analytics.
- Evolve event-driven architectures to align with dimensional modeling and downstream analytics.
- Establish feature store frameworks and reusable AI data pipelines across multiple products
- 10+ years of professional experience in Data Engineering, Analytics Engineering, or Data Architecture.
- Proven experience in enterprise-scale data architecture and distributed pipeline design.
- Expert-level proficiency in SQL and relational database design.
- Strong hands-on experience in Python for pipeline automation, orchestration, and data framework development.
- Deep expertise in dimensional modeling, including star and snowflake schemas, fact/dimension tables, SCDs, surrogate keys, and hierarchical dimensions.
- Experience designing and operating production-grade ETL/ELT pipelines for analytics and AI/ML workloads.
- Strong ability to influence technical outcomes through architectural leadership and enterprise strategy.
It would be great if you also
- Experience with cloud data warehouses: Snowflake, Databricks, BigQuery.
- Familiarity with modern data orchestration and transformation tools: dbt, Airflow, Fivetran, Segment.
- Experience handling semi-structured and event-driven data (JSON, logs, clickstream).
- Exposure to BI and visualization tools: Power BI, Tableau, Looker, SAP BusinessObjects.
- Experience with AWS, Azure, or GCP, including data governance, security, and compliance frameworks.
- Background in Life Sciences or Healthcare analytics will be a big plus
What will you be doing in this role
- Define and evolve enterprise-level product data architecture across multiple product lines, ensuring scalability, reliability, and AI/ML readiness.
- Architect scalable ETL/ELT pipelines and distributed data workflows for analytics, AI, and product intelligence.
- Develop and enforce dimensional data modeling standards (star schemas, snowflake schemas) across the organization.
- De