Data Engineer (ETL & Cloud Data Pipelines)
ycombinator
Job Description
What You'll Do At CYBLE:
Pipeline Development:
- Architect and implement ETL/ELT workflows using tools like Apache Airflow, dbt, or equivalent
- Build batch and streaming pipelines with Kafka, Spark, Beam, or similar frameworks
- Ensure reliable ingestion from diverse sources (APIs, databases, logs, message queues)
Data Modeling & Warehousing:
- Design, optimize, and maintain star schemas, data vaults, and dimensional models
- Work with cloud warehouses (Snowflake, BigQuery, Redshift) or on-premise systems
Data Quality & Governance:
- Implement validation, profiling, and monitoring to ensure data accuracy and completeness
- Enforce data lineage, schema evolution, and versioning best practices
Platform Operations:
- Containerize and deploy pipelines via Docker/Kubernetes or managed services
- Build CI/CD for data workflows and maintain observability (Prometheus, Grafana, ELK, DataDog)
- Optimize performance and cost of storage, compute, and network resources
Collaboration & Documentation:
- Partner with analytics, ML, and product teams to translate requirements into data solutions
- Document data designs, pipeline configurations, and operational runbooks
- Participate in code reviews, capacity planning, and incident response
What You’ll Need:
- 3+ years of professional data engineering experience
- Proficiency in one or more languages: Python, Java, or Scala
- Strong SQL skills and experience with relational databases (PostgreSQL, MySQL)
- Hands-on experience with at least one orchestration framework (Airflow, Prefect, Dagster)
- Familiarity with cloud platforms (AWS, GCP, or Azure) and their data services
- Experience with data warehousing solutions (Snowflake, BigQuery, Redshift)
- Solid understanding of streaming technologies (Apache Kafka, Pub/Sub)
- Ability to write clean, well-tested code and ETL configurations
- Comfortable working in Agile/Scrum teams and collaborating cross-functionally
Preferred (Nice-to-Have)
- Experience with data transformation tools (dbt, Matillion, Fivetran)
- Knowledge of workflow engines or orchestration beyond ETL (Temporal, Airflow XComs)
- Exposure to vector databases or embeddings pipelines for AI/ML use cases
- Familiarity with LLM integration concepts—prompting, RAG, feature store design
- Contributions to open-source data tools or active participation in data engineering communities
What We Offer
- Impactful Projects: Build the data foundation for high-growth analytics and AI initiatives
- Cutting-Edge Tech: Work with modern pipelines, cloud services, and real-time streaming
- Professional Growth: Access mentorship, training budgets, and conference stipends