Senior Data Platform Engineer

ashbyhq

Bangalore 5 Years Exp Posted 1d ago

Job Description

  1. Big Data Platform & Infrastructure

  • Design, build, and operate large-scale data processing infrastructure using Spark on Databricks — ensuring reliability, performance, and cost efficiency at scale.

  • Architect and maintain lakehouse solutions (Delta Lake, Iceberg) including partitioning strategies, Z-ordering, and compaction jobs.

  • Own cluster management, autoscaling policies, and resource governance across Databricks workspaces.

  • Drive platform-level improvements: query optimisation, caching strategies, compute–storage separation, and shuffle tuning.

  1. ETL / ELT Pipeline Engineering

  • Design and build robust, idempotent, and testable data pipelines handling batch and near-real-time workloads.

  • Manage and extend our Airflow-based orchestration layer — DAG authoring standards, dependency management, alerting, and SLA enforcement.

  • Implement and maintain CDC pipelines (Debezium, Kafka Connect, or native DB replication) ensuring low-latency, high-fidelity data propagation.

  • Define data pipeline contracts (schemas, SLAs, quality assertions) and enforce them via automated data quality frameworks.

  1. Analytical Storage & Computation

  • Model and manage analytical data stores — dimensional models, OBT patterns, and aggregation layers optimised for BI and self-serve analytics.

  • Own the evolution of our analytical warehouse/lakehouse stack — performance benchmarking, cost modelling, and technology selection.

  • Build and maintain efficient data serving layers for dashboards, ML feature stores, and reverse ETL use cases.

  • Implement data retention, archival, and lifecycle management policies across hot/warm/cold storage tiers.

  1. Platform Engineering & Developer Experience

  • Define and enforce data platform engineering best practices — code standards, CI/CD for pipelines, automated testing, and observability.

  • Build internal tooling and libraries that make data engineers faster: reusable Spark utilities, pipeline templates, local dev environments.

    • Champion data reliability engineering: lineage tracking, incident response playbooks, pipeline SLO monitoring, and root cause analysis.

Similar Openings for You