Senior Data Engineer

alight

Chennai, India 5 Years Exp Posted 71d ago

Job Description

Core Responsibilities

Design, build, and maintain high‑volume ETL/ELT pipelines across Hadoop (HDFS, Hive, Spark, Kafka) and AWS (Glue, EMR, Lambda, Step Functions, Redshift).
Develop distributed data processing solutions using PySpark, Spark SQL, and scalable cloud serverless patterns.
Implement reusable data ingestion frameworks for batch (Sqoop, Hive, Spark) and streaming (Kafka, Kinesis).
Optimize data workflows using partitioning, bucketing, compression, file formats (Parquet/ORC).
Understanding hybrid data lake architectures using S3 + HDFS, ensuring governance consistency (Atlas, Ranger, Lake Formation).
Understanding the reporting requirements and perform data profiling and create design for same.
Create data flow diagram and do data modelling.
Job orchestration using Airflow, Control‑M, Step Functions, or event-driven triggers.
Understand auto-scaling, capacity planning, and performance tuning on EMR and Spark clusters.
Ensure data is protected and compliant with regulatory standards.
Work closely with business stakeholders to enable high‑quality datasets.
Provide technical leadership in architecture decisions, code reviews, and best‑practice adoption and provide technical guidance to peers/juniors in team.
Improve reliability, scalability, and performance through automation, autoscaling, and capacity planning.
Own deployment, incident response, and post-incident reviews for production environments, troubleshooting Spark performance issues, job failures, and cluster bottlenecks.
Understanding security best practices (IAM, KMS, security groups, WAF, parameter/secret management).
Optimize cost and usage of AWS resources and recommend architecture improvements.
Collaborate closely with developers, QA, and product teams to streamline release processes.

Requirements

Technical Skills

Strong experience from 5-8 eyars with the Hadoop ecosystem (HDFS, Hive, Spark, YARN, Kafka).
Strong hands-on expertise in Scala, PySpark, Spark optimization techniques, HiveQL, and distributed computing.
Good work experience in SQL in hive and impala
Good understanding of AWS data stack (S3, Glue, EMR, Lambda, Kinesis, Redshift, Step Functions).
Proficiency in at least one scripting/programming language: Python, Shell scripting.
Strong experience with CI/CD, GitHub, Git commands.
Expertise in ETL and Data Warehousing and cloud concepts.
Good understanding of data modelling (star/snowflake), partitioning strategies, and schema evolution.
Expertise in data profiling and decision making.
Able to understand, design and create data flow diagrams and do data modelling. (knowledge of Miro will be added advantage)
Able to understand the architecture and design end-to-end data flow.
Hands-on experience with Airflow, Control‑M, or other orchestrators.
To monitor and support BAU and year end activities, if needed.
Well versed with security and compliance aspects in Cloud.
Good understanding of AWS networking (VPC, subnets, routing, SGs, NACLs).
Familiarity with serverless patterns and containerization (Docker, ECS/EKS).
Experience with monitoring/logging tools and incident management practices.

Other Requirements

Strong logical and analytical, problem-solving, and communication skills.
Communicate effectively and concisely with multiple stakeholders and coordinate and collaborate with cross functional teams.
Ability to support both legacy Hadoop workloads and cloud-first architectures.
AWS certifications (Data Engineer, Solutions Architect, or Developer) are a plus.
- Good to have health care domain knowledge.

Senior Data Engineer

Job Description

Similar Openings for You

Data Engineer

AI Data Foundation Engineer

Senior Data Engineer- Spark, Abinitio, Python, SQL, Data warehouse

Senior Software Engineer- Data Engineering