Sr. Cloud Engineer
darwinbox
Job Description
Key Responsibility:
- Experience in building and manage end-to-end analytics workloads using Microsoft Fabric, OneLake, Lakehouses, and Warehouses. Implement Direct Lake connectivity for high-performance Power BI reporting.
- Experience design and develop scalable data processing engines using Azure Databricks. Leverage PySpark for complex transformations, streaming, and large-scale data wrangling.
- Experience in architect multi-stage data pipelines using Azure Data Factory (ADF) and Synapse Pipelines. Focus on metadata-driven frameworks and dynamic orchestration to minimize hard-coding.
- Advanced Scripting & Transformation:
- SQL Scripts: Write high-performance T-SQL and Spark SQL for complex business logic, data validation, and performance tuning in Synapse Dedicated/Serverless pools.
- Python: Develop custom Python modules for API integrations, automation scripts, and advanced data manipulation beyond standard ETL tools.
- Experience in Implementing the Medallion Architecture (Bronze/Silver/Gold) using Delta Lake formats to ensure ACID transactions, data lineage, and schema evolution.
Required Skills & Expertise
- Primary Toolset: Extensive hands-on experience with Microsoft Fabric, Azure Databricks, Azure Synapse Analytics, and Azure Data Factory.
- Expert Scripting: Python: Deep proficiency in Python and PySpark (Spark Core, Spark SQL, Structured Streaming).
- Advanced SQL: Expert-level SQL scripting (Window functions, CTEs, stored procedures, and query optimization).
- Delta Lake & OneLake: Strong understanding of Delta Lake table formats and the unified storage principles of OneLake.
- Orchestration Patterns: Proven ability to build reusable, parameter-driven pipelines in ADF that handle diverse data sources (REST APIs, Parquet, Delta).
- DevOps & CI/CD: Good to have experience with Git-based version control (Azure DevOps/GitHub) and deploying data infrastructure as code (Bicep/Terraform)