Data Engineer

worldbankgroup

Chennai, India 3 Years Exp Posted 7h ago

Job Description

The Data  Engineer is responsible for designing, building, and maintaining the data infrastructure that supports the organization's data-driven decision-making processes. With limited supervision, this role develops ETL processes, optimizes data retrieval performance, and collaborates with stakeholders to gather and understand data requirements, ultimately supporting the organization's data integration and transformation initiatives.

 

Key Responsibilities: 

 

Data Pipeline Development

 

• Design, develop, and maintain data pipelines for ingestion, transformation, and serving across batch and streaming workloads

• Build ETL/ELT workflows to integrate data from diverse sources into enterprise data platforms

• Develop data transformation logic using Apache Spark, PySpark, SparkSQL, and SQL

• Implement change data capture (CDC) patterns for real-time and near-real-time data synchronization

• Build streaming data pipelines for real-time analytics and operational use cases

• Optimize pipeline performance, resource utilization, and cost efficiency

Federated Data Pipelines & Domain Enablement

 

• Support federated data pipeline architecture that enables Line of Business (LOB) teams to own and manage their domain data

• Contribute to self-serve data infrastructure that abstracts complexity and allows domain teams to build pipelines independently

• Develop standardized pipeline deployment patterns that LOB teams can adopt while maintaining autonomy

• Support domain teams in building data products that are discoverable, interoperable, and compliant with enterprise standards

• Enable distributed data processing across domains while ensuring consistency through federated governance

• Assist in establishing data contracts and interoperability standards that allow seamless data sharing across domains

• Support the balance between domain autonomy and enterprise-wide governance requirements

 

Templates, Blueprints & Patterns

 

• Develop reusable pipeline templates and Infrastructure as Code (IaC) patterns for common data product types

• Create blueprints for data ingestion, transformation, quality validation, and serving that LOB teams can customize

• Build standardized patterns for batch pipelines, streaming pipelines, CDC implementations, and API-based integrations

• Contribute to a pattern library covering medallion architecture, dimensional modeling, and data product packaging

• Document best practices and reference architectures that guide LOB teams in building compliant, high-quality pipelines

• Develop starter kits and accelerators that reduce time-to-value for domain teams building new data products

• Create cookbooks and implementation guides that translate enterprise standards into actionable steps

• Support LOB teams in adopting templates while allowing appropriate customization for domain-specific needs

Data Integration

 

• Integrate data from multiple internal and external sources into unified data assets

• Build reusable data integration patterns and connectors for enterprise data sources

• Implement data ingestion using Auto Loader, COPY INTO, and other ingestion frameworks

• Develop API-based data integrations and file-based data processing workflows

• Ensure data consistency and reliability across integrated sources

• Support data migration efforts and legacy system integrations

Data Modeling & Transformation

 

• Implement medallion architecture patterns (bronze, silver, gold) for data organization and quality progression

• Develop dimensional models, fact tables, and aggregations for analytics use cases

• Build data transformation logic that ensures accuracy, consistency, and business alignment

• Create reusable transformation components and modular pipeline designs

• Optimize data models for query performance and consumption patterns

• Support schema evolution and data versioning requirements

Data Quality & Testing

 

• Implement data quality checks, validation rules, and automated testing within pipelines

• Develop data profiling and anomaly detection to identify quality issues

• Build data reconciliation processes to en

Similar Openings for You