Data Engineer
ennvee
Job Description
Essential Responsibilities
- Build, enhance, and maintain production data pipelines and datasets on a modern cloud data platform (Databricks or Snowflake), with an emphasis on stability, reliability, and continuous improvement.
- Develop efficient ingestion, transformation, and curation workflows using industry-standard patterns such as the medallion architecture (bronze / silver / gold) or an equivalent layered design.
- Design and implement dimensional and analytical data models (Kimball star schema, Data Vault, or equivalent) that support reporting, self-service analytics, and downstream AI/ML workloads.
- Troubleshoot and resolve data pipeline, data quality, and platform issues promptly, with clear root-cause analysis and durable fixes.
- Partner with stakeholders across the organization to understand data needs, translate requirements into technical designs, and set clear expectations on scope and delivery.
- Contribute to data security and governance — including access controls, PII handling, row-level security, masking, and usage logging — using tools such as Unity Catalog, Snowflake Horizon, or equivalent.
- Implement data quality checks and observability (expectations, tests, monitoring, alerting) to ensure trustworthy datasets for downstream consumers.
- Support analysts and report builders with dataset design, documentation, and best practices for modern BI tools (Power BI, Tableau, Looker, or similar).
- Participate in code reviews, CI/CD deployments, and change management; own the quality of your releases to production.
- Stay current with platform features and recommend adoption of new capabilities where they drive measurable value.
Required Qualifications
- 3–5 years of hands-on experience in a data engineering or closely related technical role.
- Production experience delivering solutions on a modern cloud data platform — Databricks or Snowflake (Databricks strongly preferred).
- Strong proficiency in SQL and Python, including writing performant, well-tested, production-grade code.
- Hands-on experience building ETL/ELT pipelines — ingestion, transformation, cleansing, and curation — against large, complex datasets.
- Working knowledge of data modeling techniques (Kimball / dimensional modeling, Data Vault, or medallion architecture) and when to apply each.
- Experience with workflow orchestration tools such as Apache Airflow, Azure Data Factory, Databricks Workflows, dbt, or equivalent.
- Experience integrating with enterprise source systems — ERPs (e.g., SAP, Oracle, Dynamics, Workday), CRMs, APIs, and relational databases.
- Hands-on experience with at least one major cloud provider (Azure, AWS, or GCP); Azure preferred.
- Experience with Git-based version control and CI/CD for data pipelines (Azure DevOps, GitHub Actions, GitLab CI, or similar).
- Exposure to data quality and observability practices — test frameworks, expectations, lineage, monitoring, and alerting (Great Expectations, dbt tests, Monte Carlo, or similar).
- Familiarity with Agile/Scrum delivery and collaborative development environments.
- Bachelor’s degree in computer science, Engineering, a STEM field, or equivalent practical experience.
Preferred Qualifications
- Production experience with Databricks Unity Catalog, Delta Lake, and Delta Live Tables; or Snowflake equivalents (Horizon, Dynamic Tables, Streams & Tasks).
- Experience with streaming / real-time data pipelines (Kafka, Event Hubs, Kinesis, Structured Streaming, Snowpipe Streaming) and/or IoT data patterns.
- Working knowledge of Machine Learning (ML), Large Language Models (LLMs), and common AI/ML data enablement patterns (feature stores, vector stores, RAG).
- Experience managing platform cost and performance — cluster/warehouse sizing, cost reporting, budgets, and alerting.
- Experience administering a modern BI platform (Power BI, Tableau, Looker) — workspace governance, certified datasets, and best-practice enforcement.
- Experience with Infrastructure as Code (Terraform, Bicep).
- Experience with R, Scala, or other statistical / JVM-based programming languages.
Preferred Certifications
- Databricks Certified Data Engineer Associate or Professional
- SnowPro Core / SnowPro Advanced: Data Engineer
- Microsoft Certified: Azure Data Engineer Associate
- AWS Certified Data Engineer – Associate, or Google Cloud Professional Data Engineer
Soft Skills & Ways of Working
- Strong written and verbal communication — able to explain technical concepts clearly to both tech