Data Engineer
teamlink
Job Description
Job Responsibilities
- Understand long-term and short-term business requirements to precisely match them with the capabilities of different distributed storage and computing technologies from the plethora of options available in the ecosystem.
- Create complex data processing pipelines.
- Design scalable implementations of the models developed by our Data Scientists.
- Deploy data pipelines in production systems based on CICD practices.
- Create and maintain clear documentation on data models/schemas as well as transformation/validation rules.
- Troubleshoot and remediate data quality issues raised by pipeline alerts or downstream consumers.
Desired Skills and Experience
- 2+ years of overall industry experience building and deploying large scale data processing pipelines in a production environment.
- Experience building data pipelines and data centric applications using distributed storage platforms such as HDFS, S3, NoSql databases (Hbase, Cassandra, etc.); and distributed processing platforms such as Hadoop, Spark, Hive, Oozie, Airflow, etc.
- Hands on experience with MapR, Cloudera, Hortonworks, and/or Cloud (AWS EMR, Azure HDInsights, Qubole, etc.) based Hadoop distributions.
- Practical experience working with well-known data engineering tools and platforms Kafka, Spark, Hadoop.
- Solid understanding of Data Modelling, ML and AI concepts.
- Fluent in programming languages like Nodejs/Java/Python.
- Education: B.E / B Tech / M Tech / MS.