Data Engineer
thermofisher
Job Description
- Bachelor's degree or equivalent in Computer Science, Information Technology, Data Engineering, or related field
- 3-5 years of experience in data engineering, ETL development, SQL, AWS data platforms, or production data pipeline support
Major Job Responsibilities:
- Develop, test, tune, and maintain ETL and data pipelines using PySpark, Python, SQL, and AWS services
- Support ingestion and transformation of flat files, relational databases, APIs, data warehouses, and enterprise data sources
- Collaborate with business analysts, data architects, QA, DevOps, and senior engineers to implement source-to-target mappings and data solutions
- Implement CDC, incremental load design, idempotent pipeline processing, and data reconciliation patterns for reliable data movement
- Maintain technical documentation, mapping specifications, data catalog updates, runbooks, automated tests, and release support materials
Knowledge, Skills, and Abilities:
- Hands-on experience with PySpark, Python, advanced SQL, ETL best practices, data modeling, and large-scale data processing
- Deep knowledge of Redshift performance tuning including distribution keys, sort keys, compression encoding, Spectrum, materialized views, WLM, vacuum, and analyze
- Strong knowledge of Athena optimization including partition pruning, file formats, compression, schema evolution, and cost-efficient query design
- Strong understanding of DynamoDB data modeling, access-pattern-based design, capacity planning, GSIs/LSIs, TTL, Streams, and performance tuning
- Exposure to secure PHI/PII handling including encryption, access controls, auditability, retention, masking, and de-identification where applicable
- Strong analytical, troubleshooting, documentation, communication, and cross-functional collaboration skills