Sr Data Engineer
amgen
Job Description
-
Design, develop, and maintain scalable ETL/ELT pipelines to support structured, semi-structured, and unstructured data processing across the Enterprise Data Engineering for Biotech or Pharma functional knowledge of R&D.
-
Implement real-time and batch data processing solutions, integrating data from multiple sources into a unified, governed data fabric architecture.
-
Optimize big data processing frameworks using Apache Spark, Hadoop, or similar distributed computing technologies to ensure high availability and cost efficiency.
-
Work with metadata management and data lineage tracking tools to enable enterprise-wide data discovery and governance.
-
Ensure data security, compliance, and role-based access control (RBAC) across data environments.
-
Optimize query performance, indexing strategies, partitioning, and caching for large-scale data sets.
-
Develop CI/CD pipelines for automated data pipeline deployments, version control, and monitoring.
-
Implement data virtualization techniques to provide seamless access to data across multiple storage systems.
-
Collaborate with cross-functional teams, including data architects, business analysts, and DevOps teams, to align data engineering strategies with enterprise goals.
-
Stay up to date with emerging data technologies and best practices, ensuring continuous improvement of Enterprise Data Fabric architectures.
-
Model data for analytics and ML (star/snowflake, Data Vault, semantic layers) and implement robust ELT patterns (dbt or equivalent).
-
Build and maintain a lakehouse/warehouse (e.g., Delta Lake/Iceberg/Hudi; Snowflake/Redshift/BigQuery) with partitioning, clustering, and cost/perf optimization.
-
Orchestrate workflows with Airflow/Azure Data Factory/Prefect and implement CI/CD for data (Git-based deployments, environments, automated tests).
-
Implement data quality and observability (Great Expectations/Deequ, expectations-as-code, lineage/metadata, SLOs and alerting with OpenTelemetry/Prometheus/Datadog).
-
Enforce security and governance (RBAC/ABAC, encryption, secrets, tokenization), manage PII/PHI under GDPR/CCPA and secure SDLC for data.
-
Partner with analytics, data science, and product to define interfaces, SLAs, and contracts; publish clear docs, runbooks, and diagrams.
-
Lead technical discovery, RFCs, and POCs; evaluate vendor tools and guide integrations.
-
Mentor engineers; raise the bar on code quality, reviews, and engineering practices.
-