Data Engineer
amgen
Job Description
Roles & Responsibilities:
-
Design, develop, and maintain data solutions for data generation, collection, and processing
-
Be a key team member that assists in design and development of the data pipeline
-
Create data pipelines and ensure data quality by implementing ETL processes to migrate and deploy data across systems
-
Contribute to the design, development, and implementation of data pipelines, ETL/ELT processes, and data integration solutions
-
Take ownership of data pipeline projects from inception to deployment, manage scope, timelines, and risks
-
Collaborate with cross-functional teams to understand data requirements and design solutions that meet business needs
-
Develop and maintain data models, data dictionaries, and other documentation to ensure data accuracy and consistency
-
Implement data security and privacy measures to protect sensitive data
-
Leverage cloud platforms (AWS preferred) to build scalable and efficient data solutions
-
Collaborate and communicate effectively with product teams
-
Collaborate with Data Architects, Business SMEs, and Data Scientists to design and develop end-to-end data pipelines to meet fast-paced business needs across geographic regions
-
Identify and resolve complex data-related challenges
-
Adhere to best practices for coding, testing, and designing reusable code/component
-
Explore new tools and technologies that will help to improve ETL platform performance
-
Participate in sprint planning meetings and provide estimations on technical implementation
-
Design and develop data pipelines leveraging Databricks, PySpark, and SQL to ingest, transform, and process large-scale datasets.
-
Engineer solutions for both structured and unstructured data to enable advanced analytics and insights.
-
Implement automated workflows for data ingestion, transformation, and deployment using Databricks Jobs and notebooks, with ongoing monitoring and scheduling.
-
Apply performance optimization techniques, including Spark job tuning, caching, partitioning, and indexing, to improve scalability and efficiency.
-
Build integrations with multiple data sources, such as SQL databases, APIs, and cloud storage platforms, ensuring seamless connectivity and reliability.
-
Collaborate effectively with global teams across time zones to maintain alignment, resolve issues, and deliver on shared objectives.
Basic Qualifications and Experience:
-
Bachelor’s / Master’s degree and 4 to 8 years of Computer Science, IT or related field experience
Functional Skills:
Must-Have Skills
-
Hands-on experience with big data technologies and platforms, such as Databricks, Apache Spark (PySpark, SparkSQL), workflow orchestration, performance tuning on big data processing
-
Proficiency in data analysis tools (e.g. SQL) and experience with data visualization tools
-
Excellent problem-solving skills and the ability to work with large, complex datasets
-
Strong understanding of data governance frameworks, tools, and best practices.
Good-to-Have Skills:
-
Knowledge of data protection regulations and compliance requirements (e.g., GDPR, CCPA) processing
-
Experience with ETL tools such as Apache Spark, and various Python packages related to data processing, machine learning model development
-
Strong understanding of data modeling, data warehousing, and data integration concepts
-
Knowledge of Python/R, Databricks, SageMaker, cloud data platforms
-
Experience implementing automated orchestration and monitoring of data pipelines using Databricks Jobs, Apache Airflow, or similar workflow tools.
-
Familiarity with performance optimization techniques for big data processing, such as Spark job tuning, caching, partitioning, and indexing.
-
Exposure to multi-source integration involving APIs, SQL databases, and cloud storage platforms.
-
Demonstrated ability to collaborate across global teams and time zones,
-