Data Engineer
persistent
Job Description
- Translate client requirements into technical designs and end?to?end data engineering solutions
- Architect and implement cloud and non?cloud big data platforms
- Lead design discussions across data ingestion, transformation, storage, computation, and consumption
- Build and maintain batch and streaming data pipelines
- Implement ETL/ELT pipelines using distributed processing frameworks
- Drive performance optimization, scalability, and reliability of data solutions
- Design and implement solutions using Hadoop ecosystem components
- Ensure data governance, security, privacy, and quality standards are met
- Set up and manage secure big data platforms, including authentication and authorization
- Orchestrate workflows using enterprise pipeline orchestration tools
- Provide technical leadership and hands?on guidance to delivery teams
- Lead teams delivering solutions on cloud (AWS/GCP/Azure) and on?prem platforms
- Participate in client workshops and help align stakeholders to optimal architecture choices
- Manage functional and non?functional scope, SLAs, and delivery quality
- Mentor team members and contribute to thought leadership and best practices
- Support hiring, people management, and capability building initiatives
- Act as a trusted technical advisor to clients across programs
Expertise You'll Bring:
- 5+ years of overall IT experience with 3+ years in data engineering technologies
- 3+ years of hands?on experience with Big Data technologies
- 1+ year experience delivering data solutions on AWS, Azure, or GCP
- Experience delivering at least one end?to?end Big Data architecture as a technical lead or architect
- Strong understanding of Big Data architecture patterns
- Expert-level knowledge of Hadoop ecosystem (Cloudera or cloud distributions)
- Strong programming skills in Java or Scala (Python good to have)
- Expertise in data ingestion tools such as Sqoop, Flume, NiFi
- Experience with distributed messaging systems like Kafka, Pulsar, Pub/Sub
- Strong experience with Spark (Core, Streaming, SQL) and/or Storm, Flink
- Experience with MPP query engines such as Impala, Presto, Athena
- Hands?on experience with NoSQL databases (MongoDB, Cassandra, HBase) or cloud NoSQL (DynamoDB, BigTable)
- Knowledge of cluster security, including data?at?rest and data?in?transit protection
- Experience implementing monitoring and alerting for big data platforms
- Hands?on experience with orchestration tools such as Oozie, Airflow, Control?M
- Proven experience with performance tuning, optimization, and data security