Data Engineer - II
ea
Job Description
Key Responsibilities:
* Design, build, and sustain efficient, scalable and performant Data Engineering Pipelines to ingest, sanitize, transform (ETL/ELT), and deliver high-volume, high-velocity data from diverse sources.
* Ensure reliable and consistent processing of versatile workloads of granularity such as Real Time, Near Real Time, Mini-batch, Batch and On-demand.
* Translate business requirements into technical specifications, design and implement solutions with clear documentation.
* Develop, optimize, and support production data workflows to ensure comprehensive and accurate datasets.
* With Software Engineering mindset/discipline, adopt best practices in writing modular code that is maintainable, scalable and performant.
* Use orchestration and scheduling tools (e.g., Airflow, GitLab Runners) to streamline workflows.
* Automate deployment and monitoring of data workflows using CI/CD best practices. Use DevOps best practices to instrument retries and self-healing to minimize, if not avoid manual intervention.
* Use AI/Gen AI tools and technologies to build/generate reusable code.
* Collaborate with Architects, Data scientists, BI engineers, Analysts, Product/ Program Mgmt and other stakeholders to deliver end-to-end solutions.
* Promote strategies to improve our data modelling, quality and architecture
* Mentor junior engineers, and contribute to team knowledge sharing.
* Have an analytics mindset to explore and identify opportunities for improved metrics, insights contributing to business growth.
Qualifications:
* Masters or Bachelors degree in Computer science or associated discipline with relevant industry experience(3+ Years) in a data engineering role.
* Strong Proficiency in writing and analyzing complex SQL, Python or any 4GL.
* Strong experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code tools (e.g., Terraform, CloudFormation).
* Strong hands-on experience with Cloud Data Warehouses like Snowflake, Redshift, BigQuery or other big data solutions
* Expertise & Experience with Data Lake and Open table formats technologies (e.g., Apache Iceberg)
* Expertise & Experience with distributed data processing frameworks (e.g., Apache Spark, Flink, Beam, Trino).
* Experience with real-time/streaming data technologies (e.g., Kafka, Kinesis, Spark Streaming).
* Experience in using modern pipeline orchestration tools such as Airflow
* Knowledge in data warehousing concepts, data modelling and performance optimization techniques.
* Experience in version control and CI/CD workflows (e.g., Git).
* Familiarity with BI Tools like Looker, Tableau & Power BI, dashboard design, data quality, data governance, and data observability tools
* Experience with containerization (Docker, Kubernetes).
* Experience working in Agile development environments and with tools like JIRA or Confluence.
* Strong problem-solving, analytical, and debugging skills.
* Excellent communication and collaboration skills, with the ability to work across business, analytics, and engineering teams.