Data Engineer

teamlink

Remote 2 Years Exp Posted 32d ago

Job Description

Job Responsibilities

  • Understand long-term and short-term business requirements to precisely match them with the capabilities of different distributed storage and computing technologies from the plethora of options available in the ecosystem.
  • Create complex data processing pipelines.
  • Design scalable implementations of the models developed by our Data Scientists.
  • Deploy data pipelines in production systems based on CICD practices.
  • Create and maintain clear documentation on data models/schemas as well as transformation/validation rules.
  • Troubleshoot and remediate data quality issues raised by pipeline alerts or downstream consumers.

Desired Skills and Experience

  • 2+ years of overall industry experience building and deploying large scale data processing pipelines in a production environment.
  • Experience building data pipelines and data centric applications using distributed storage platforms such as HDFS, S3, NoSql databases (Hbase, Cassandra, etc.); and distributed processing platforms such as Hadoop, Spark, Hive, Oozie, Airflow, etc.
  • Hands on experience with MapR, Cloudera, Hortonworks, and/or Cloud (AWS EMR, Azure HDInsights, Qubole, etc.) based Hadoop distributions.
  • Practical experience working with well-known data engineering tools and platforms Kafka, Spark, Hadoop.
  • Solid understanding of Data Modelling, ML and AI concepts.
  • Fluent in programming languages like Nodejs/Java/Python.
    • Education: B.E / B Tech / M Tech / MS.

Similar Openings for You