Data Engineer

instahyre

Mumbai, India 2 Years Exp Posted 83d ago

Responsibilities:

Design, develop, and maintain scalable ETL/ELT data pipelines using PySpark.
Implement real-time or near-real-time data processing using Apache Flink.
Optimize data workflows for performance, scalability, and reliability.
Work with large-scale data platforms and distributed environments.
Collaborate with cross-functional teams to integrate data solutions into products and analytics platforms.
Ensure data quality, integrity, and governance across pipelines.
Conduct performance tuning, debugging, and root-cause analysis of data processes.
Write clean, modular, and well-documented code following best engineering practices.

Requirements:

Bachelor's or Master's degree in Computer Science, Engineering, Information Technology, or a related field.

Primary Skills:

Strong hands-on experience in PySpark(RDD, DataFrame API, Spark SQL).
Experience with Apache Flink (streaming or batch).
Solid understanding of distributed computing concepts.
Proficiency in Python for data engineering workflows.
Strong SQL skills for data manipulation and transformation.
Experience with data pipeline orchestration tools (Airflow, Step Functions, etc. ).

Secondary Skills:

Soft Skills:

Strong analytical and problem-solving capabilities.
Ability to work independently and as part of a collaborative team.
Good communication and documentation skills.
- Ownership mindset and willingness to learn and adapt.