Lead Data Engineer

irissoftware

Noida, UP, India 9 Years Exp Posted 46d ago

Job Description

   Develop and maintain robust, scalable ETL/ELT pipelines using PySpark on AWS EMR
•    Build data ingestion and transformation workflows from diverse sources (S3, EMR, RDS, Kafka, APIs) into AWS-based data lakes and warehouses
•    Write clean, modular, testable Python code following best practices and coding standards
•    Implement comprehensive unit tests using pytest/unittest with mocking, fixtures, and high code coverage
•    Design and build production-grade Airflow DAGs for workflow orchestration, scheduling, and monitoring
•    Optimize Spark jobs for performance, memory efficiency, and cost reduction
•    Implement CI/CD pipelines for automated testing and deployment using Jenkins, GitHub Actions, or AWS CodePipeline
•    Troubleshoot and debug complex data pipeline issues in production environments
•    Collaborate with Data Scientists, Analysts, and Platform Engineers to deliver data solutions
•    Ensure data quality, security, and compliance standards are met
Required Skills & Qualifications
•    9+ years of hands-on data engineering experience (no management responsibilities required)
•    Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
•    Expert-level Python programming – OOP, design patterns, clean code practices
•    Advanced PySpark/Spark skills – partitioning strategies, shuffle optimization, memory tuning, broadcast joins
•    Strong unit testing expertise using pytest/unittest – mocking, parametrized tests, fixtures, TDD mindset
•    Hands-on Airflow experience – DAG design, custom operators, sensors, XComs, debugging failed tasks
•    Deep AWS experience: S3, EMR, Glue, Redshift, Lambda, Step Functions, IAM, CloudWatch
•    Solid understanding of data lake and warehouse architectures (medallion architecture, Delta Lake)
•    Strong SQL skills – complex queries, window functions, query optimization
•    Proficiency with Git, code reviews, and collaborative development workflows
•    Experience with CI/CD pipelines and automated testing frameworks

Nice to Have (Preferred)
•    Familiarity with Docker for containerized data workloads
•    Exposure to streaming data (Kafka, Spark Streaming)
•    Knowledge of data quality frameworks
•    Background in financial services or regulated industries
•    Understanding of data security and privacy practices (GDPR)

Similar Openings for You