Data Engineer - Big Data Technologies

hirist

Chennai, India 10 Years Exp Posted 71d ago

Responsibilities :

- Design, develop, and maintain robust and scalable data pipelines using Apache Spark and Scala on the Databricks platform.

- Implement ETL (Extract, Transform, Load) processes for various data sources, ensuring data quality, integrity, and efficiency.

- Optimize Spark applications for performance and cost-efficiency within the Databricks environment.

- Work with Delta Lake for building reliable data lakes and data warehouses, ensuring ACID transactions and data versioning.

- Collaborate with data scientists, analysts, and other engineering teams to understand data requirements and deliver solutions.

- Implement data governance and security best practices within Databricks.

- Troubleshoot and resolve data-related issues, ensuring data availability and reliability.

- Stay updated with the latest advancements in Spark, Scala, Databricks, and related big data technologies.

Required Skills and Experience :

- Proven experience as a Data Engineer with a strong focus on big data technologies.

- Expertise in Scala programming language for data processing and Spark application development.

- In-depth knowledge and hands-on experience with Apache Spark, including Spark SQL, Spark Streaming, and Spark Core.

- Proficiency in using Databricks platform features, including notebooks, jobs, workflows, and Unity Catalog.

- Experience with Delta Lake and its capabilities for building data lakes.

- Strong understanding of data warehousing concepts, data modeling, and relational databases.

- Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and their data services.

- Experience with version control systems like Git.

- Excellent problem-solving and analytical skills.

- Ability to work independently and as part of a team.

Preferred Qualifications (Optional) :

- Experience with other big data technologies like Kafka, Flink, or Hadoop ecosystem components.

- Knowledge of data visualization tools.

- Understanding of DevOps principles and CI/CD pipelines for data engineering.

- Relevant certifications in Spark or Databrick