Data Engineer
carrier
Job Description
- Work closely with data scientists, analysts, and business stakeholders to understand data requirements and objectives for machine learning and analytics projects.
- Design, develop, and maintain scalable and efficient data pipelines for transforming and cleansing raw data into ML-ready datasets.
- Implement data transformation logic and algorithms using tools such as PySpark, Apache Spark, or similar frameworks to pre-process and clean data.
- Utilize cloud-based data warehouse solutions such as Amazon Redshift to store and manage large volumes of structured and unstructured data.
- Collaborate with data architects and database administrators to optimize data models, schema designs, and query performance for analytics and reporting purposes.
- Ensure data quality and integrity by implementing data validation checks, error handling mechanisms, and monitoring processes throughout the data pipeline.
- Work with cross-functional teams to identify and address data integration and interoperability challenges, including data synchronization, data consistency, and data governance.
- Stay up-to-date with the latest advancements in data engineering, big data technologies, and machine learning techniques, and proactively apply new methodologies and best practices to improve data processing workflows.