Data Engineer (ETL & Cloud Data Pipelines)

ycombinator

Bengaluru, India 3 Years Exp Posted 79d ago

Pipeline Development:

Architect and implement ETL/ELT workflows using tools like Apache Airflow, dbt, or equivalent
Build batch and streaming pipelines with Kafka, Spark, Beam, or similar frameworks
Ensure reliable ingestion from diverse sources (APIs, databases, logs, message queues)

Data Modeling & Warehousing:

Design, optimize, and maintain star schemas, data vaults, and dimensional models
Work with cloud warehouses (Snowflake, BigQuery, Redshift) or on-premise systems

Data Quality & Governance:

Implement validation, profiling, and monitoring to ensure data accuracy and completeness
Enforce data lineage, schema evolution, and versioning best practices

Platform Operations:

Containerize and deploy pipelines via Docker/Kubernetes or managed services
Build CI/CD for data workflows and maintain observability (Prometheus, Grafana, ELK, DataDog)
Optimize performance and cost of storage, compute, and network resources

Collaboration & Documentation:

Partner with analytics, ML, and product teams to translate requirements into data solutions
Document data designs, pipeline configurations, and operational runbooks
Participate in code reviews, capacity planning, and incident response

3+ years of professional data engineering experience
Proficiency in one or more languages: Python, Java, or Scala
Strong SQL skills and experience with relational databases (PostgreSQL, MySQL)
Hands-on experience with at least one orchestration framework (Airflow, Prefect, Dagster)
Familiarity with cloud platforms (AWS, GCP, or Azure) and their data services
Experience with data warehousing solutions (Snowflake, BigQuery, Redshift)
Solid understanding of streaming technologies (Apache Kafka, Pub/Sub)
Ability to write clean, well-tested code and ETL configurations
Comfortable working in Agile/Scrum teams and collaborating cross-functionally

Preferred (Nice-to-Have)

Experience with data transformation tools (dbt, Matillion, Fivetran)
Knowledge of workflow engines or orchestration beyond ETL (Temporal, Airflow XComs)
Exposure to vector databases or embeddings pipelines for AI/ML use cases
Familiarity with LLM integration concepts—prompting, RAG, feature store design
Contributions to open-source data tools or active participation in data engineering communities

What We Offer

Impactful Projects: Build the data foundation for high-growth analytics and AI initiatives
Cutting-Edge Tech: Work with modern pipelines, cloud services, and real-time streaming
- Professional Growth: Access mentorship, training budgets, and conference stipends