Data Engineer
irissoftware
Job Description
Basic Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or related field.
- 6+ years of experience in data engineering, with a proven track record of working in large-scale data initiatives.
- Deep expertise in Python, PySpark.
- Strong hands-on experience with Databricks (Spark, Delta Lake, Workflows)
- Strong experience with AWS (S3, IAM, Textract, Bedrock or equivalent)
- Experience with design and implement scalable document ingestion pipelines using Databricks Auto Loader and AWS S3.
- Understanding of vector embeddings and semantic search
- Strong understanding of data governance, privacy, and compliance in regulated industries (healthcare, life sciences).
Good To have :
- Advanced knowledge of data modeling, lakehouse/lake/warehouse design, and performance optimization.
- Familiarity with generative AI platforms and use cases.
- Contributions to open-source projects or thought leadership in data engineering/architecture.
- Experience with Agile methodologies, CI/CD, and DevOps practices.
- Exposure to FastAPI, or API-based ML services
- Experience evaluating LLM output quality
Key Responsibilities :
- Design, develop, and optimize complex data pipelines and transformation processes using Snowflake, dbt, and AWS services.
- Develop and maintain scalable data models and schemas in Snowflake, ensuring they meet performance and business requirements.
- Monitor and fine-tune the performance of data pipelines, queries, and data models to ensure optimal efficiency and cost-effectiveness.
- Utilize Snowflake’s features, such as Time Travel, Zero-Copy Cloning, and Data Sharing, to enhance data management and performance.
- Leverage AWS services, such as AWS Lambda, S3, and Glue, to build and manage serverless data processing workflows and data storage solutions.
- Implement data security measures and ensure compliance with data privacy regulations and organizational policies.
- Troubleshoot and resolve complex data issues, including data sync errors, performance bottlenecks, and integration challenges.
- Provide support for data-related incidents and ensure effective resolution of production issues.
- Collaborate with data analysts, and other stakeholders to understand data needs and deliver effective solutions.
- Document data processes, models, and workflows, ensuring clear communication and knowledge sharing across teams.
- Independently assess situations, apply sound judgment and discretion, and make decisions on matters of significant impact without direct supervision