Data Engineering – Data Lead
keka
Job Description
1. Enterprise Data Platform Architecture
∙ Design and implement the AWS-based enterprise data lake architecture.
∙ Build scalable frameworks to handle structured, semi-structured, and unstructured datasets.
∙ Define standards for data ingestion, transformation, storage, and access.
∙ Ensure seamless integration with the Databricks analytics platform.
2. Data Ingestion & Integration
Design and develop real-time and batch data ingestion pipelines for sources such as:
Internal systems
∙ Trading & order management systems
∙ Portfolio management platforms
∙ Client onboarding / KYC systems
∙ CRM platforms
∙ ERP / accounting systems
External sources
∙ Market data vendors
∙ Research and news feeds
∙ Documents and reports
∙ Audio or surveillance data
Technologies
AWS AppFlow, AWS Lambda, AWS Glue, Amazon S3, Amazon Athena
3. Real-Time Data Processing
∙ Develop event-driven data pipelines to support near-real-time data ingestion.
∙ Enable real-time use cases such as: trading analytics, operational monitoring, compliance and surveillance analytics.
4. Security & Governance
Ensure platform security and compliance through: AWS Key Management Service (KMS) for encryption, AWS Secrets Manager for credential management, AWS Security Hub for security monitoring, AWS Config for configuration governance, AWS CloudTrail for audit trails
5. Monitoring & Observability
∙ Implement monitoring frameworks using: AWS CloudWatch, Grafana dashboards
∙ Monitor: pipeline performance, infrastructure health, data freshness, ingestion failures.
6. DevOps & Platform Automation
∙ Implement CI/CD pipelines using GitLab.
∙ Automate deployment and testing of data pipelines.
∙ Establish standards for version control, code quality, and automated deployments.
7. Data Quality & Metadata
∙ Implement frameworks for data validation, reconciliation, and monitoring.
∙ Manage metadata and data lineage using AWS Glue Data Catalog.
8. Integration with Databricks
∙ Deliver curated and optimized datasets for analytics on Databricks.
∙ Collaborate with analytics teams to enable BI, advanced analytics, and ML workloads.
9. Hands-on Technical Leadership
∙ Actively participate in pipeline development, architecture design, and technical problem solving.
∙ Provide technical guidance and code reviews to the data engineering team.
∙ Drive adoption of engineering best practices and reusable data frameworks.
10. Agile Delivery & Collaboration
∙ Operate in an Agile / iterative development environment.
∙ Work closely with analytics teams, business stakeholders, and platform engineers.
∙ Deliver incremental data products and platform capabilities with rapid turnaround.