Senior Data Engineer
weareroku
Job Description
Big Data Engineering:
- Design, develop, and maintain data pipelines and ETL workflows using Apache Spark, Apache Airflow.
- Optimise data storage, retrieval, and processing systems to ensure reliability, scalability, and performance.
- Develop and fine-tune complex queries and data processing jobs for large-scale datasets.
- Monitor, troubleshoot, and improve data systems for minimal downtime and maximum efficiency.
Software Development:
- Write clean, maintainable, and efficient code, ensuring adherence to best practices through code reviews.
Collaboration & Mentorship:
- Partner with data scientists, software engineers, and other teams to deliver integrated, high-quality solutions.
- Provide technical guidance and mentorship to junior engineers, promoting best practices in data engineering.
AI-augmented engineering & intelligent data interfaces:
- Apply modern AI-assisted development practices responsibly (for example assisted code review, test generation, and documentation) while maintaining production quality, security, and compliance standards.
- Design and evolve semantic search and retrieval over internal metadata (datasets, lineage, dashboards, runbooks): embeddings, indexing, and guardrailed query interfaces where they improve engineer and analyst productivity.
- Stay current on responsible AI expectations relevant to advertising data: privacy, PII handling, access control, auditability, and human-in-the-loop review for high-risk automation.
We’re excited if you have
- Bachelor’s degree in computer science, Engineering, or a related field (or equivalent experience).
- 10+ years of experience in software and/or data engineering with expertise in big data technologies such as Apache Spark, Apache Airflow and Trino.
- Strong understanding of SOLID principles and distributed systems architecture.
- Proven experience in distributed data processing, data warehousing, and real-time data pipelines.
- Advanced SQL skills, with expertise in query optimisation for large datasets.
- Exceptional problem-solving abilities and the capacity to work independently or collaboratively.
- Excellent verbal and written communication skills.
- Experience with cloud platforms such as AWS, GCP, or Azure, and containerisation tools like Docker and Kubernetes. (preferred)
- Familiarity with additional big data technologies, including Hadoop, Kafka, and Trino. (preferred)
- Strong programming skills in Python, Java, or Scala. (preferred)
- Knowledge of CI/CD pipelines, DevOps practices, and infrastructure-as-code tools (e.g., Terraform). (preferred)
- Expertise in data modelling, schema design, and data visualisation tools. (preferred)