Senior Data Engineer

taleo

Hyderabad 5 Years Exp Posted 66d ago

Build scalable data ingestion pipelines for relational, semi-structured, and unstructured data sources
Design, implement, and optimize lakehouse architectures using Apache Iceberg
Optimize table design including partitioning, compaction, schema evolution, and performance tuning for Iceberg datasets
Implement best practices for versioning, time travel, incremental processing, and ACID compliance
Develop and optimize Apache Spark (batch and streaming) jobs for large-scale data processing
Work extensively with AWS services such as Glue, EMR, Lambda, Step Functions, and S3 with a focus on cost and performance optimization
Build and manage real-time data pipelines using Kafka and Kafka Streaming
Design and orchestrate workflows using DBT and Airflow
Implement automated data quality checks, validation frameworks, and error monitoring mechanisms
Establish observability frameworks including monitoring, logging, and alerting for data pipelines
Collaborate with analytics/reporting teams to enable data quality dashboards and reporting
Analyze existing pipelines to identify improvements and enhance reliability and scalability
Leverage AI/LLM-based tools to accelerate ETL/ELT development, validation, and debugging
Participate in code reviews and contribute to best practices and engineering standards

Skills

Bachelor’s degree (or higher) in Computer Science, Engineering, or a related technical field
5+ years of experience designing, building, and maintaining data pipelines
Strong programming skills in SQL, Python, and Apache Spark
Hands-on experience with AWS data services (Glue, EMR, S3, Lambda, Step Functions)
Deep understanding of lakehouse architectures and Apache Iceberg
Experience with DBT and Airflow for data transformation and orchestration
Strong experience with Kafka and real-time streaming pipelines
Experience working with Snowflake as a cloud data warehouse
Strong understanding of data quality frameworks, validation, and monitoring
Experience handling structured, semi-structured, and unstructured data at scale
Solid understanding of distributed systems and data engineering best practices
Experience with CI/CD pipelines and automation (preferred)
Strong problem-solving skills and ability to work in a fast-paced environment
- Excellent communication skills and ability to collaborate with cross-functional teams