Data Engineer

google

Bangalore 3 Years Exp Posted 1h ago

Design, build, and manage real-time data pipelines using tools like Apache Kafka, Apache Flink, Apache Spark Streaming.
Optimize data pipelines for performance, scalability, and fault-tolerance.
Perform real-time transformations, aggregations, and joins on streaming data.
Collaborate with data scientists to onboard new features and ensure they're discoverable, documented, and versioned.
Optimize feature retrieval latency for real-time inference use cases.
Ensure strong data governance: lineage, auditing, schema evolution, and quality checks using tools such as dbt, and Open Lineage.

Requirements:

Bachelor's degree in Engineering from a premier institute (IIT/NIT/ BIT)
3-5 years of experience in an Indian startup/ tech company
Strong programming skills in Python, Java, or Scala and proficient in SQL.
Solid understanding of data modeling, data warehousing concepts, and the differences between OLTP and OLAP workloads.
Experience ingesting and processing various data formats, including semi-structured (JSON, Avro), unstructured, and document-based data from sources like NoSQL databases (e.g., MongoDB), APIs, and event tracking platforms (e.g., PostHog).
Hands-on experience with Change Data Capture (CDC) tools such as Debezium or AWS DMS for replicating data from transactional databases.
Proven experience designing and building scalable data lakes or lakehouse architectures on platforms like Databricks.
Hands-on experience with modern open table formats such as Delta Lake, Apache Iceberg, or Apache Hudi.
Hands-on experience with real-time streaming technologies like Kafka, Flink, and Spark Streaming.
Proficiency with data pipeline orchestration tools like Apache Airflow.
Exposure to event-driven microservices architecture.
- Strong written and verbal communication skills.