Data Engineer

teamlink

Remote 2 Years Exp Posted 81d ago

Job Responsibilities

Understand long-term and short-term business requirements to precisely match them with the capabilities of different distributed storage and computing technologies from the plethora of options available in the ecosystem.
Create complex data processing pipelines.
Design scalable implementations of the models developed by our Data Scientists.
Deploy data pipelines in production systems based on CICD practices.
Create and maintain clear documentation on data models/schemas as well as transformation/validation rules.
Troubleshoot and remediate data quality issues raised by pipeline alerts or downstream consumers.

2+ years of overall industry experience building and deploying large scale data processing pipelines in a production environment.
Experience building data pipelines and data centric applications using distributed storage platforms such as HDFS, S3, NoSql databases (Hbase, Cassandra, etc.); and distributed processing platforms such as Hadoop, Spark, Hive, Oozie, Airflow, etc.
Hands on experience with MapR, Cloudera, Hortonworks, and/or Cloud (AWS EMR, Azure HDInsights, Qubole, etc.) based Hadoop distributions.
Practical experience working with well-known data engineering tools and platforms Kafka, Spark, Hadoop.
Solid understanding of Data Modelling, ML and AI concepts.
Fluent in programming languages like Nodejs/Java/Python.
- Education: B.E / B Tech / M Tech / MS.