Data Engineer

instahyre

Mumbai 2 Years Exp Posted 46d ago

Job Description

  • Design, develop, and maintain scalable ETL/ELT data pipelines using PySpark.
  • Implement real-time or near-real-time data processing using Apache Flink.
  • Optimize data workflows for performance, scalability, and reliability.
  • Work with large-scale data platforms and distributed environments.
  • Collaborate with cross-functional teams to integrate data solutions into products and analytics platforms.
  • Ensure data quality, integrity, and governance across pipelines.
  • Conduct performance tuning, debugging, and root-cause analysis of data processes.
  • Write clean, modular, and well-documented code following best engineering practices.

 

Requirements:

  • Bachelor's or Master's degree in Computer Science, Engineering, Information Technology, or a related field.

 

Primary Skills:

  • Strong hands-on experience in PySpark(RDD, DataFrame API, Spark SQL).
  • Experience with Apache Flink (streaming or batch).
  • Solid understanding of distributed computing concepts.
  • Proficiency in Python for data engineering workflows.
  • Strong SQL skills for data manipulation and transformation.
  • Experience with data pipeline orchestration tools (Airflow, Step Functions, etc. ).

 

Secondary Skills:

  • Experience with cloud platforms (AWS / Azure / GCP).
  • Knowledge of data lakes, lakehouse architectures, and modern data stack tools.
  • Familiarity with Delta Lake, Iceberg, or Hudi.
  • Experience with CI/CD pipelines for data workflows.
  • Understanding of messaging/streaming systems (Kafka, Kinesis).
  • Knowledge of DevOps and container tools (Docker).

 

Soft Skills:

  • Strong analytical and problem-solving capabilities.
  • Ability to work independently and as part of a collaborative team.
  • Good communication and documentation skills.
  • Ownership mindset and willingness to learn and adapt.

Similar Openings for You