Data Engineer

thermofisher

Bengaluru, India 3 Years Exp Posted 1h ago

Job Description

  • Bachelor's degree or equivalent in Computer Science, Information Technology, Data Engineering, or related field
  • 3-5 years of experience in data engineering, ETL development, SQL, AWS data platforms, or production data pipeline support

 

Major Job Responsibilities:

  • Develop, test, tune, and maintain ETL and data pipelines using PySpark, Python, SQL, and AWS services
  • Support ingestion and transformation of flat files, relational databases, APIs, data warehouses, and enterprise data sources
  • Collaborate with business analysts, data architects, QA, DevOps, and senior engineers to implement source-to-target mappings and data solutions
  • Implement CDC, incremental load design, idempotent pipeline processing, and data reconciliation patterns for reliable data movement
  • Maintain technical documentation, mapping specifications, data catalog updates, runbooks, automated tests, and release support materials

 

Knowledge, Skills, and Abilities:

  • Hands-on experience with PySpark, Python, advanced SQL, ETL best practices, data modeling, and large-scale data processing
  • Deep knowledge of Redshift performance tuning including distribution keys, sort keys, compression encoding, Spectrum, materialized views, WLM, vacuum, and analyze
  • Strong knowledge of Athena optimization including partition pruning, file formats, compression, schema evolution, and cost-efficient query design
  • Strong understanding of DynamoDB data modeling, access-pattern-based design, capacity planning, GSIs/LSIs, TTL, Streams, and performance tuning
  • Exposure to secure PHI/PII handling including encryption, access controls, auditability, retention, masking, and de-identification where applicable
  • Strong analytical, troubleshooting, documentation, communication, and cross-functional collaboration skills

 

Must Have Skills:

  • PySpark, Python, advanced SQL, ETL development, and data pipeline implementation experience
  • AWS data services experience including S3, Glue, Lambda, Step Functions, ECS, DynamoDB, Redshift, PostgreSQL, SQL Server, and Athena integration
  • Flat-file ingestion, source-to-target mapping, transformation logic, CDC, incremental loads, idempotent processing, reconciliation, and data quality checks
  • CI/CD, GitHub workflows, automated testing, and release management for data pipelines and database changes
  • Problem-solving, production support, debugging, documentation, and Agile delivery skills

 

Good to Have Skills:

  • Exposure to AI-assisted mapping automation and use of LLMs for data cleaning, data quality checks, transformation logic, or documentation
  • Familiarity with RAG patterns, embeddings, vector databases, semantic search, or AI-enabled data discovery solutions
  • Understanding of healthcare data standards such as HL7, FHIR, CCD, claims data, EMR extracts, clinical trial data, and patient de-identification
    • Familiarity with infrastructure as code such as Terraform or CloudFormation, plus Databricks, Snowflake, streaming, observability, or DevOps practices

Similar Openings for You