Senior Data Platform Engineer

ashbyhq

Bangalore 5 Years Exp Posted 1d ago

Design, build, and operate large-scale data processing infrastructure using Spark on Databricks — ensuring reliability, performance, and cost efficiency at scale.
Architect and maintain lakehouse solutions (Delta Lake, Iceberg) including partitioning strategies, Z-ordering, and compaction jobs.
Own cluster management, autoscaling policies, and resource governance across Databricks workspaces.
Drive platform-level improvements: query optimisation, caching strategies, compute–storage separation, and shuffle tuning.

Design and build robust, idempotent, and testable data pipelines handling batch and near-real-time workloads.
Manage and extend our Airflow-based orchestration layer — DAG authoring standards, dependency management, alerting, and SLA enforcement.
Implement and maintain CDC pipelines (Debezium, Kafka Connect, or native DB replication) ensuring low-latency, high-fidelity data propagation.
Define data pipeline contracts (schemas, SLAs, quality assertions) and enforce them via automated data quality frameworks.

Model and manage analytical data stores — dimensional models, OBT patterns, and aggregation layers optimised for BI and self-serve analytics.
Own the evolution of our analytical warehouse/lakehouse stack — performance benchmarking, cost modelling, and technology selection.
Build and maintain efficient data serving layers for dashboards, ML feature stores, and reverse ETL use cases.
Implement data retention, archival, and lifecycle management policies across hot/warm/cold storage tiers.

Define and enforce data platform engineering best practices — code standards, CI/CD for pipelines, automated testing, and observability.
Build internal tooling and libraries that make data engineers faster: reusable Spark utilities, pipeline templates, local dev environments.
- Champion data reliability engineering: lineage tracking, incident response playbooks, pipeline SLO monitoring, and root cause analysis.