Specialist, GSF DnA Data Engineer

msd

Hyderabad 5 Years Exp Posted 59d ago

Job Description

What you will do

Design, build, and operate batch and streaming data pipelines to ingest data from multiple sources into an AWS data lake / lakehouse and data warehouse.
Develop and maintain ETL/ELT transformations using Python, PySpark, and SQL; optimize jobs for performance, cost, and reliability.
Partner with Data Analysts, Data Scientists, and business stakeholders to understand use cases and deliver curated, analytics-ready datasets and features.

Implement data quality controls (validation rules, reconciliation, anomaly checks), define SLAs/SLOs, and contribute to metadata, lineage, and data catalog practices.

Use orchestration and observability to run pipelines reliably (e.g., Databricks Workflows, AWS Step Functions, scheduling, logging, monitoring, alerting).

Apply engineering best practices: unit/integration testing, automated data tests, code reviews, and quality gates within CI/CD.

Model and publish data for BI/analytics using dimensional modeling (star/snowflake), facts & dimensions, and slowly changing dimensions (SCD).

Write and tune advanced SQL for profiling, transformations, and performance troubleshooting across large datasets.

Build on AWS using services such as S3, Glue, Lambda, Step Functions, EMR, and CloudWatch; follow security best practices (IAM, encryption, least privilege).

Provision and manage cloud resources using Infrastructure as Code (e.g., Terraform) across dev/test/prod environments.

Package and deploy workloads using Docker (and where applicable ECS/Fargate); manage dependencies and runtime configurations.

Use GitHub for version control (branching strategies, pull requests, code reviews) and set up CI/CD for automated build, test, and deployment.

Develop scalable processing on Databricks / Apache Spark using PySpark and lakehouse concepts (e.g., Delta Lake, ACID, schema evolution).

Use notebooks (e.g., Jupyter/Databricks) for exploration and PoCs, then productionize solutions with reusable modules, tests, and deployment pipelines.

Work in an Agile delivery model (planning, daily sync, reviews, retros), providing accurate estimates and proactively managing risks/dependencies.

Create and maintain technical documentation (data contracts, pipeline specs, runbooks) and support operational handoffs.

Similar Openings for You

Data Engineer

globallogic • Noida

AI Data Foundation Engineer

ford • Chennai, India

Senior Data Engineer- Spark, Abinitio, Python, SQL, Data warehouse

wellsfargojobs • Bengaluru, India

Senior Software Engineer- Data Engineering

wbd • Hyderabad