Principal Data Engineer

Acuity

Bangalore, 10 Years Exp Posted 44d ago

Job Description

  • Design end-to-end data architectures on GCP using services such as BigQuery, GCS, Dataproc, Composer, Dataform, Data Fusion, and related cloud-native tooling. 
  • Build and standardize scalable batch and near-real-time data pipelines that support: 
  • model training datasets 
  • feature engineering workflows 
  • batch and online inference use cases 
  • analytics and operational reporting 
  • Define reusable patterns for data ingestion, transformation, quality validation, metadata capture, lineage, and observability. 
  • Design data models optimized not only for BI/reporting, but also for machine learning, AI, and GenAI workloads, including structured, semi-structured, and unstructured data. 
  • Establish patterns for CDC, ELT, distributed processing, and data product design in support of modern cloud platforms. 

Partner Closely with Data Science and ML Engineering 

  • Collaborate with Data Scientists, ML Engineers, Data Architects, Platform Engineers, and Product Owners to translate business and model requirements into scalable data solutions. 
  • Enable ML teams by delivering trusted, discoverable, and reproducible datasets for experimentation and production. 
  • Support feature generation and data preparation workflows used in model development and operationalization. 
  • Partner with ML engineering teams on data interfaces for tools such as Vertex AI, feature stores, model monitoring, and inference pipelines. 
  • Contribute to data patterns that support AI/GenAI use cases such as prompt pipelines, retrieval-augmented generation (RAG), vector-ready data preparation, and document/content processing where applicable. 

Strategic Data Engineering Leadership 

  • Drive architecture direction for a GCP-first data platform serving analytics and AI/ML workloads. 
  • Evaluate emerging technologies and recommend best-fit solutions to improve scalability, performance, reliability, and cost efficiency. 
  • Lead modernization efforts to refactor legacy ETL into cloud-optimized, maintainable ELT and ML-ready data pipelines. 
  • Define best practices for data quality, lineage, governance, security, and lifecycle management across enterprise data assets used for analytics and ML. 

Technical Leadership and Engineering Excellence 

  • Act as a technical mentor and reviewer for senior and mid-level data engineers. 
  • Lead large cross-functional initiatives and serve as the escalation point for complex data platform and production pipeline issues. 
  • Set coding, testing, and deployment standards using GitHub and CI/CD practices
  • Promote strong software engineering discipline across the data engineering function, including code quality, automated testing, documentation, and operational readiness. 

Operational Excellence 

  • Ensure reliability, scalability, observability, and governance across production data environments. 
  • Define and monitor SLAs/SLOs for critical data pipelines and data products. 
  • Create technical documentation, reusable frameworks, and metadata standards that improve enterprise data maturity. 
  • Partner with stakeholders to align data platform capabilities with business outcomes and ML/AI roadmap priorities. 

Similar Openings for You