Data Engineer

global

Bengaluru (Bangalore) 2 Years Exp Posted 88d ago

Develop, maintain, and optimize Spark-based data processing pipelines using Scala.
Work with distributed computing frameworks and resource management systems such as YARN.
Ingest, process, and manage large datasets using tools across the Hadoop ecosystem (HDFS, Hive, HBase, Oozie, etc.).
Write complex SQL queries for data extraction, transformation, validation, and performance optimization
Perform data validation, quality checks, and troubleshooting across datasets and jobs.
Monitor and improve data pipeline performance, ensuring high availability and reliability.
Participate in code reviews, documentation, and knowledge-sharing sessions.
Support ETL workflows, debugging production issues, and maintaining operational excellence