Specialist, GSF DnA Data Engineer

msd

Hyderabad 5 Years Exp Posted 10d ago

Job Description

What you will do 

  • Design, build, and operate batch and streaming data pipelines to ingest data from multiple sources into an AWS data lake / lakehouse and data warehouse.
  • Develop and maintain ETL/ELT transformations using Python, PySpark, and SQL; optimize jobs for performance, cost, and reliability.
  • Partner with Data Analysts, Data Scientists, and business stakeholders to understand use cases and deliver curated, analytics-ready datasets and features.
  • Implement data quality controls (validation rules, reconciliation, anomaly checks), define SLAs/SLOs, and contribute to metadata, lineage, and data catalog practices.
  • Use orchestration and observability to run pipelines reliably (e.g., Databricks Workflows, AWS Step Functions, scheduling, logging, monitoring, alerting).
  • Apply engineering best practices: unit/integration testing, automated data tests, code reviews, and quality gates within CI/CD.
  • Model and publish data for BI/analytics using dimensional modeling (star/snowflake), facts & dimensions, and slowly changing dimensions (SCD).
  • Write and tune advanced SQL for profiling, transformations, and performance troubleshooting across large datasets.
  • Build on AWS using services such as S3, Glue, Lambda, Step Functions, EMR, and CloudWatch; follow security best practices (IAM, encryption, least privilege).
  • Provision and manage cloud resources using Infrastructure as Code (e.g., Terraform) across dev/test/prod environments.
  • Package and deploy workloads using Docker (and where applicable ECS/Fargate); manage dependencies and runtime configurations.
  • Use GitHub for version control (branching strategies, pull requests, code reviews) and set up CI/CD for automated build, test, and deployment.
  • Develop scalable processing on Databricks / Apache Spark using PySpark and lakehouse concepts (e.g., Delta Lake, ACID, schema evolution).
  • Use notebooks (e.g., Jupyter/Databricks) for exploration and PoCs, then productionize solutions with reusable modules, tests, and deployment pipelines.
  • Work in an Agile delivery model (planning, daily sync, reviews, retros), providing accurate estimates and proactively managing risks/dependencies.
  • Create and maintain technical documentation (data contracts, pipeline specs, runbooks) and support operational handoffs.

Similar Openings for You