Data Engineer
bnpparibas
Job Description
Direct Responsibilities:
- Migrate the existing Hadoop infrastructure to cloud infrastructure on Kubernetes Engine, COS, Spark as a service, and Airflow as a service.
- Implement data transformation and quality to ensure data consistency and accuracy. Utilize programming languages such as Scala and SQL and tools like Spark for data transformation and enrichment operations.
- Set up CI/CD pipelines to automate deployments, unittestingand development management.
- Write and conduct unit and validation tests to ensure accuracy and integrity of code developed.
- Automate data pipelines and streamline data ingestion through the implementation of different orchestrators and scheduling processes (Airflow as aService mainly).
- Writing technical documentation (specifications, operational documents) to ensureknowledgecapitalization.
Contributing Responsibilities:
- Team player
- Adhere to the standards and practices followed in the Project
- Foster a culture of continuous learning and improvement within the team.
- Collaborate with cross-functional teams to understand data requirements and deliver solutions.
Technical & Behavioral Competencies
Technical Skills
- At least5 years of working experience inDataengineering
- Working experience onSpark on Scala/Python/ Java(any of these languages)
- KnowledgeonApache Airflow,Oozieor any other similar schedulertools
- Strong knowledge of SQL and NoSQL databases
- Good exposure to CI/CD tools (Gitlab, Jenkins…)
- Knowledge of Kubernetes containerization
- Integration experience with S3 storage/COS and parquet (and ORC) format
- Hands onknowledge ofUnix shell script
- Design effective prompts toleverageGen AI tools across IT domains (e.g., development, testing, data generation, documentation) during the development stage.
- Nice tohave: exposureonany of the datavirtualizationtoollikeDremio
- Nice tohave:exposure to Kafka, Elastic search, Kibana,Hvault
- Nice to have: Working knowledge of HDFS, Hadoop and Hive