Data Engineer
instahyre
Job Description
Responsibilities:
- Design, develop, and maintain scalable ETL/ELT data pipelines using PySpark.
- Implement real-time or near-real-time data processing using Apache Flink.
- Optimize data workflows for performance, scalability, and reliability.
- Work with large-scale data platforms and distributed environments.
- Collaborate with cross-functional teams to integrate data solutions into products and analytics platforms.
- Ensure data quality, integrity, and governance across pipelines.
- Conduct performance tuning, debugging, and root-cause analysis of data processes.
- Write clean, modular, and well-documented code following best engineering practices.
Requirements:
- Bachelor's or Master's degree in Computer Science, Engineering, Information Technology, or a related field.
Primary Skills:
- Strong hands-on experience in PySpark(RDD, DataFrame API, Spark SQL).
- Experience with Apache Flink (streaming or batch).
- Solid understanding of distributed computing concepts.
- Proficiency in Python for data engineering workflows.
- Strong SQL skills for data manipulation and transformation.
- Experience with data pipeline orchestration tools (Airflow, Step Functions, etc. ).
Secondary Skills:
- Experience with cloud platforms (AWS / Azure / GCP).
- Knowledge of data lakes, lakehouse architectures, and modern data stack tools.
- Familiarity with Delta Lake, Iceberg, or Hudi.
- Experience with CI/CD pipelines for data workflows.
- Understanding of messaging/streaming systems (Kafka, Kinesis).
- Knowledge of DevOps and container tools (Docker).
Soft Skills:
- Strong analytical and problem-solving capabilities.
- Ability to work independently and as part of a collaborative team.
- Good communication and documentation skills.
- Ownership mindset and willingness to learn and adapt.