Principal Data Engineer
greenhouse
Job Description
Data Architecture & Engineering
- Architect, develop, and optimize complex data pipelines, data models, and ELT/ETL workflows across Synapse, Databricks, SQL Server, and cloud-native services.
- Lead large-scale data integration and modernization projects, expanding on the team’s responsibilities for analyzing requirements, creating specifications, and managing delivery schedules.
- Define and enforce best practices for data engineering, performance optimization, data quality, reliability, and observability.
Cross-Functional Collaboration
- Advise senior leaders and product owners on data strategy, platform capabilities, and architectural tradeoffs.
- Partner with Security, Infrastructure, and business teams to enable high‑quality, trusted, and governed data products.
Advanced Analytics & BI Enablement
- Design and optimize highly complex, high‑volume, and low‑latency data pipelines supporting analytics, operational workloads, and ML/AI use cases.
- Oversee implementation of robust data quality, lineage, cataloging, and metadata capabilities in partnership with Data Governance.
Cloud, Platform & Tooling Expertise
- Serve as the technical authority on Azure Synapse, Databricks, Spark, SQL Server, and cloud-native data services.
- Lead platform enhancements for performance, cost optimization, reliability, and global scale.
Mentorship & Technical Stewardship
- Coach and mentor Senior and Mid‑Level Data Engineers, elevating engineering capability across the organization.
- Lead technical design reviews, provide architectural guidance, and cultivate a culture of engineering excellence.
- Champion innovation through research, POCs, and evaluation of emerging technologies.
Who you are?
- 10+ years of data engineering, data architecture, or analytics engineering experience.
- Proven experience as a lead or principal-level engineer designing large-scale, distributed data systems.
- Expert-level proficiency in:
- SQL, performance tuning, query optimization
- Azure Synapse, Databricks, Spark/Delta Lake
- ETL/ELT pipelines, orchestration frameworks
- Data modeling, OLTP/OLAP, dimensional and semantic modeling
- Cloud architecture patterns and distributed computing