Data Engineer
avature
Job Description
Data Infrastructure for AI/ML:
- Design and implement robust data pipelines that support data preprocessing, model training, and deployment.
- Ensure that the data pipeline is optimized for high-volume and high-velocity data required by ML models.
- Build and manage feature stores that can efficiently store, retrieve, and serve features for ML models.
AI/ML Model Integration:
- Collaborate with ML engineers and data scientists to integrate machine learning models into production environments.
- Implement tools for model versioning, experimentation, and deployment (e.g., MLflow, Kubeflow, TensorFlow Extended).
- Support automated retraining and model monitoring pipelines to ensure models remain performant over time.
Data Architecture & Design
- Design and maintain scalable, efficient, and secure data pipelines and architectures.
- Develop data models (both OLTP and OLAP).
- Create and maintain ETL/ELT processes.
Data Pipeline Development
- Build automated pipelines to collect, transform, and load data from various sources (internal and external).
- Optimize data flow and collection for cross-functional teams.
MLOps Support:
- Develop CI/CD pipelines to deploy models into production environments.
- Implement model monitoring, alerting, and logging for real-time model predictions.
Data Quality & Governance
- Ensure high data quality, integrity, and availability.
- Implement data validation, monitoring, and alerting mechanisms.
- Support data governance initiatives and ensure compliance with data privacy laws (e.g., GDPR, HIPAA).
Tooling & Infrastructure
- Work with cloud platforms (AWS, Azure, GCP) and data engineering tools like Apache Spark, Kafka, Airflow, etc.
- Use containerization (Docker, Kubernetes) and CI/CD pipelines for data engineering deployments.
Team Collaboration & Mentorship
- Collaborate with data scientists, analysts, product managers, and other engineers.
- Provide technical leadership and mentor junior data engineers.
Soft Skills:
- Strong problem-solving and critical-thinking skills.
- Excellent communication and collaboration abilities.
- Leadership experience and the ability to guide technical decisions.
Educational Qualifications:
- Bachelor's or Master's degree in Computer Science, Engineering, or related field.
- 4+ years of experience in data engineering.
- Strong understanding of data modeling, ETL/ELT concepts, and distributed systems.
- Experience with big data tools and cloud platforms.