Data Engineer
global
Job Description
- Develop, maintain, and optimize Spark-based data processing pipelines using Scala.
- Work with distributed computing frameworks and resource management systems such as YARN.
- Ingest, process, and manage large datasets using tools across the Hadoop ecosystem (HDFS, Hive, HBase, Oozie, etc.).
- Write complex SQL queries for data extraction, transformation, validation, and performance optimization
- Perform data validation, quality checks, and troubleshooting across datasets and jobs.
- Monitor and improve data pipeline performance, ensuring high availability and reliability.
- Participate in code reviews, documentation, and knowledge-sharing sessions.
- Support ETL workflows, debugging production issues, and maintaining operational excellence