Principal Data Engineer
Acuity
Job Description
- Design end-to-end data architectures on GCP using services such as BigQuery, GCS, Dataproc, Composer, Dataform, Data Fusion, and related cloud-native tooling.
- Build and standardize scalable batch and near-real-time data pipelines that support:
- model training datasets
- feature engineering workflows
- batch and online inference use cases
- analytics and operational reporting
- Define reusable patterns for data ingestion, transformation, quality validation, metadata capture, lineage, and observability.
- Design data models optimized not only for BI/reporting, but also for machine learning, AI, and GenAI workloads, including structured, semi-structured, and unstructured data.
- Establish patterns for CDC, ELT, distributed processing, and data product design in support of modern cloud platforms.
Partner Closely with Data Science and ML Engineering
- Collaborate with Data Scientists, ML Engineers, Data Architects, Platform Engineers, and Product Owners to translate business and model requirements into scalable data solutions.
- Enable ML teams by delivering trusted, discoverable, and reproducible datasets for experimentation and production.
- Support feature generation and data preparation workflows used in model development and operationalization.
- Partner with ML engineering teams on data interfaces for tools such as Vertex AI, feature stores, model monitoring, and inference pipelines.
- Contribute to data patterns that support AI/GenAI use cases such as prompt pipelines, retrieval-augmented generation (RAG), vector-ready data preparation, and document/content processing where applicable.
Strategic Data Engineering Leadership
- Drive architecture direction for a GCP-first data platform serving analytics and AI/ML workloads.
- Evaluate emerging technologies and recommend best-fit solutions to improve scalability, performance, reliability, and cost efficiency.
- Lead modernization efforts to refactor legacy ETL into cloud-optimized, maintainable ELT and ML-ready data pipelines.
- Define best practices for data quality, lineage, governance, security, and lifecycle management across enterprise data assets used for analytics and ML.
Technical Leadership and Engineering Excellence
- Act as a technical mentor and reviewer for senior and mid-level data engineers.
- Lead large cross-functional initiatives and serve as the escalation point for complex data platform and production pipeline issues.
- Set coding, testing, and deployment standards using GitHub and CI/CD practices.
- Promote strong software engineering discipline across the data engineering function, including code quality, automated testing, documentation, and operational readiness.
Operational Excellence
- Ensure reliability, scalability, observability, and governance across production data environments.
- Define and monitor SLAs/SLOs for critical data pipelines and data products.
- Create technical documentation, reusable frameworks, and metadata standards that improve enterprise data maturity.
- Partner with stakeholders to align data platform capabilities with business outcomes and ML/AI roadmap priorities.