Senior Data Engineer
taleo
Job Description
- Build scalable data ingestion pipelines for relational, semi-structured, and unstructured data sources
Design, implement, and optimize lakehouse architectures using Apache Iceberg - Optimize table design including partitioning, compaction, schema evolution, and performance tuning for Iceberg datasets
- Implement best practices for versioning, time travel, incremental processing, and ACID compliance
- Develop and optimize Apache Spark (batch and streaming) jobs for large-scale data processing
- Work extensively with AWS services such as Glue, EMR, Lambda, Step Functions, and S3 with a focus on cost and performance optimization
- Build and manage real-time data pipelines using Kafka and Kafka Streaming
- Design and orchestrate workflows using DBT and Airflow
- Implement automated data quality checks, validation frameworks, and error monitoring mechanisms
- Establish observability frameworks including monitoring, logging, and alerting for data pipelines
- Collaborate with analytics/reporting teams to enable data quality dashboards and reporting
- Analyze existing pipelines to identify improvements and enhance reliability and scalability
- Leverage AI/LLM-based tools to accelerate ETL/ELT development, validation, and debugging
- Participate in code reviews and contribute to best practices and engineering standards
Skills
- Bachelor’s degree (or higher) in Computer Science, Engineering, or a related technical field
- 5+ years of experience designing, building, and maintaining data pipelines
- Strong programming skills in SQL, Python, and Apache Spark
- Hands-on experience with AWS data services (Glue, EMR, S3, Lambda, Step Functions)
- Deep understanding of lakehouse architectures and Apache Iceberg
- Experience with DBT and Airflow for data transformation and orchestration
- Strong experience with Kafka and real-time streaming pipelines
- Experience working with Snowflake as a cloud data warehouse
- Strong understanding of data quality frameworks, validation, and monitoring
- Experience handling structured, semi-structured, and unstructured data at scale
- Solid understanding of distributed systems and data engineering best practices
- Experience with CI/CD pipelines and automation (preferred)
- Strong problem-solving skills and ability to work in a fast-paced environment
- Excellent communication skills and ability to collaborate with cross-functional teams