Data Platform & Streaming Engineer

fluor

Vadodara, IN-GJ, India 5 Years Exp Posted 22d ago

Job Description

Key Responsibilities

  • Design and implement scalable data platform components (lake/lakehouse, data marts, event streams) to support AI/ML and analytics use cases.
  • Build and maintain real-time and near-real-time streaming pipelines using tools such as Kafka / Azure Event Hubs, Spark Structured Streaming / Flink, and stream processing patterns.
  • Develop robust batch ingestion and transformation pipelines (ETL/ELT) using Spark, SQL, and orchestration frameworks from SAP, Engineering systems, SuccessFactors and other enterprise systems.
  • Implement data modeling standards (dimensional, Data Vault, medallion architecture) suitable for analytics and ML feature readiness.
  • Ensure end-to-end data quality through validation rules, anomaly checks, schema evolution strategies, and automated testing.
  • Operationalize pipelines with CI/CD, infrastructure-as-code, version control, and environment promotion standards.
  • Establish observability (logging, metrics, tracing), SLOs, and incident response playbooks for data/streaming services.
  • Apply data governance controls: lineage, cataloging, retention, access policies, encryption, and privacy-by-design.
  • Optimize performance and cost across compute/storage by tuning jobs, partitioning strategies, caching, and streaming backpressure handling.
  • Collaborate with AI/ML engineers to enable feature stores, training data pipelines, and online/offline consistency patterns.
  • Interface with business/domain stakeholders (e.g., project controls, engineering, supply chain) to translate requirements into data products.
  • Document architectures, runbooks, and standards; mentor junior engineers and promote engineering excellence.

Basic Job Requirements

  • 5+ years of experience in data engineering, including streaming and distributed processing.
  • Strong hands-on experience with streaming platforms (e.g., Kafka, Azure Event Hubs, Confluent, Pulsar) and patterns (event-driven architecture, CDC, exactly-once/at-least-once).
  • Proficiency in Spark (PySpark/Scala) and SQL; experience with Spark Structured Streaming or equivalent.
  • Experience building data platforms on cloud (preferably Azure): ADLS, Databricks, Synapse, Data Factory, Event Hubs, Functions & AKS
  • Strong software engineering fundamentals: Python/Scala/Java, APIs, data structures, reliability patterns.
  • Familiarity with data lakehouse concepts, file formats (Delta/Iceberg/Hudi, Parquet), and schema management.
  • Experience with CI/CD (Azure DevOps/GitHub Actions), Git, and IaC (Terraform/Bicep/ARM).
  • Understanding of security fundamentals: IAM/RBAC, secrets management, encryption, and compliance-aware data handling.

Other Job Requirements

Preferred Qualifications

  • Experience implementing CDC using Debezium, Kafka Connect, or cloud CDC services.
  • Knowledge of ML data enablement: feature engineering pipelines, feature stores, training/serving data consistency.
  • Experience with data governance tooling: Purview, Data Catalog, lineage/metadata management.
  • Exposure to containerization/orchestration (Docker, Kubernetes/AKS) for data services.
  • Experience with time-series/IoT or industrial data streams (e.g., sensors, telemetry), or EPC domain datasets.
  • Familiarity with test automation for data pipelines (Great Expectations, Deequ, custom frameworks) and data contract testing.
  • Preferred (optional): Azure Data Engineer Associate, Databricks certifications, Kafka/Confluent certifications.
    • Proven experience supporting real-time streaming workloads and platform reliability in enterprise environments.

Similar Openings for You