Data Engineer (ETL & Cloud Data Pipelines)

ycombinator

Bengaluru, India 3 Years Exp Posted 31d ago

Job Description

What You'll Do At CYBLE:

Pipeline Development:

  • Architect and implement ETL/ELT workflows using tools like Apache Airflow, dbt, or equivalent
  • Build batch and streaming pipelines with Kafka, Spark, Beam, or similar frameworks
  • Ensure reliable ingestion from diverse sources (APIs, databases, logs, message queues)

Data Modeling & Warehousing:

  • Design, optimize, and maintain star schemas, data vaults, and dimensional models
  • Work with cloud warehouses (Snowflake, BigQuery, Redshift) or on-premise systems

Data Quality & Governance:

  • Implement validation, profiling, and monitoring to ensure data accuracy and completeness
  • Enforce data lineage, schema evolution, and versioning best practices

Platform Operations:

  • Containerize and deploy pipelines via Docker/Kubernetes or managed services
  • Build CI/CD for data workflows and maintain observability (Prometheus, Grafana, ELK, DataDog)
  • Optimize performance and cost of storage, compute, and network resources

Collaboration & Documentation:

  • Partner with analytics, ML, and product teams to translate requirements into data solutions
  • Document data designs, pipeline configurations, and operational runbooks
  • Participate in code reviews, capacity planning, and incident response

What You’ll Need:

  • 3+ years of professional data engineering experience
  • Proficiency in one or more languages: Python, Java, or Scala
  • Strong SQL skills and experience with relational databases (PostgreSQL, MySQL)
  • Hands-on experience with at least one orchestration framework (Airflow, Prefect, Dagster)
  • Familiarity with cloud platforms (AWS, GCP, or Azure) and their data services
  • Experience with data warehousing solutions (Snowflake, BigQuery, Redshift)
  • Solid understanding of streaming technologies (Apache Kafka, Pub/Sub)
  • Ability to write clean, well-tested code and ETL configurations
  • Comfortable working in Agile/Scrum teams and collaborating cross-functionally

Preferred (Nice-to-Have)

  • Experience with data transformation tools (dbt, Matillion, Fivetran)
  • Knowledge of workflow engines or orchestration beyond ETL (Temporal, Airflow XComs)
  • Exposure to vector databases or embeddings pipelines for AI/ML use cases
  • Familiarity with LLM integration concepts—prompting, RAG, feature store design
  • Contributions to open-source data tools or active participation in data engineering communities

What We Offer

  • Impactful Projects: Build the data foundation for high-growth analytics and AI initiatives
  • Cutting-Edge Tech: Work with modern pipelines, cloud services, and real-time streaming
    • Professional Growth: Access mentorship, training budgets, and conference stipends

Similar Openings for You