Data Engineer

google

Bangalore 3 Years Exp Posted 1h ago

Job Description

  • Design, build, and manage real-time data pipelines using tools like Apache Kafka, Apache Flink, Apache Spark Streaming.
  • Optimize data pipelines for performance, scalability, and fault-tolerance.
  • Perform real-time transformations, aggregations, and joins on streaming data.
  • Collaborate with data scientists to onboard new features and ensure they're discoverable, documented, and versioned.
  • Optimize feature retrieval latency for real-time inference use cases.
  • Ensure strong data governance: lineage, auditing, schema evolution, and quality checks using tools such as dbt, and Open Lineage.

Requirements:

  • Bachelor's degree in Engineering from a premier institute (IIT/NIT/ BIT)
  • 3-5 years of experience in an Indian startup/ tech company
  • Strong programming skills in Python, Java, or Scala and proficient in SQL.
  • Solid understanding of data modeling, data warehousing concepts, and the differences between OLTP and OLAP workloads.
  • Experience ingesting and processing various data formats, including semi-structured (JSON, Avro), unstructured, and document-based data from sources like NoSQL databases (e.g., MongoDB), APIs, and event tracking platforms (e.g., PostHog).
  • Hands-on experience with Change Data Capture (CDC) tools such as Debezium or AWS DMS for replicating data from transactional databases.
  • Proven experience designing and building scalable data lakes or lakehouse architectures on platforms like Databricks.
  • Hands-on experience with modern open table formats such as Delta Lake, Apache Iceberg, or Apache Hudi.
  • Hands-on experience with real-time streaming technologies like Kafka, Flink, and Spark Streaming.
  • Proficiency with data pipeline orchestration tools like Apache Airflow.
  • Exposure to event-driven microservices architecture.
    • Strong written and verbal communication skills.

Similar Openings for You