Sr. Data Cloud Architect
gehealthcare
Job Description
In this role, you will:
-
In Depth knowledge and hands on exp using python (pyspark) on real time data streaming technology - Kafka, spark, lambda
-
Experience in building data processing pipelines on AWS using AWS MSK, EMR (Spark Streaming), Dynamo DB , Lambda , Glue, Athena
-
Knowledge of device data ingestion and processing using AWS IoT core, IoT rules and event bridge
-
Design, implement and optimize Kafka and Spark based NRT data processing pipelines
-
Expertise in building reusable, cloud native, scalable and reliable frameworks, and tools
-
Design and implement reusable and cost-effective solution to meet functional and nonfunctional requirements like availability, latency, fault tolerance
Architectural & Design Skills
-
Designing scalable data pipelines for IoT telemetry
-
Real-time vs batch processing architecture
-
Data governance, security, and compliance
-
Cost optimization strategies on AWS
Technical Skill Set
Cloud & Infrastructure (AWS)
-
Amazon EMR – for big data processing using Spark/Hadoop
-
AWS Lambda, Step Functions – for serverless workflows
-
S3, DynamoDB, RDS – for data storage and management
-
IAM, KMS, CloudWatch, CloudTrail – for security and monitoring
-
AWS IoT Core for IoT device integration
Big Data & Analytics
-
Apache Spark & PySpark – for distributed data processing
-
Data ingestion using Kinesis, Kafka, or AWS IoT Analytics
-
ETL pipeline design and optimization
-
Data lake architecture using S3 + Glue + Athena
-
Programming & Scripting
-
Python – core language for scripting, automation, and data processing
-
Boto3 – AWS SDK for Python
-
SQL – for querying structured data
-
Shell scripting – for automation on EMR or EC2
Education Qualification
Bachelor’s degree in engineering with minimum 5+ years of experience in relevant technologies.
Desired Characteristics
Technical Expertise:
-
Excellent knowledge of software design and coding principles
-
Experience working in an Agile environment
-
Familiarity with versatile implementation options
-
Demonstrates knowledge on technical topics, such as caching, APIs, data transfer, scalability, and security
-
Experience in building and managing big data solutions, Data Lakes, Data Warehouses, Data Integration, Data Migration, and Business Intelligence/Artificial Intelligence solutions on the Cloud (AWS)
-
Experience in architecting and implementing data mesh and data fabric solutions specifically leveraging AWS services, including designing domain-oriented data architectures, data products, and data access patterns in a multi-tenant environment.
-
Expertise in API integration, Subscription based APIs, Multi tenancy, enabling efficient data exchange and synchronization between various applications and platforms.
-
Familiarity with advanced data management principles and best practices within AWS environments, including data as a service, data modelling, data lineage, data cataloguing, and metadata management.
-
Develop and maintain data models, schemas, and databases while ensuring high performance, security, and reliability in a global context.
-
Expertise in data modelling, database design principles, and best practices for data management within a global context.
Business Acumen:
-
Demonstrates the initiative to explore alternate technology and approaches to solving problems
-
Skilled in breaking down problems, documenting problem statements and estimating efforts
-