Data Engineer -Operations (BU)
lilly
Job Description
What you’ll be doing:
-
Monitor and manage day-to-day operations of data pipelines, ETL jobs, and cloud-native data platforms (e.g., AWS, Databricks, Redshift).
-
Own incident response and resolution, including root cause analysis and post-mortem reporting for data failures and performance issues.
-
Perform regular system health checks, capacity planning, and cost optimization across operational environments.
-
Maintain and enhance logging, alerting, and monitoring frameworks using tools like CloudWatch, Datadog, Prometheus, etc.
-
Collaborate with development teams to operationalize new data workflows, including CI/CD deployment, scheduling, and support documentation.
-
Ensure data quality by executing validation checks, recon processes, and business rule compliance.
-
Work with vendors (if applicable) and internal teams to support migrations, upgrades, and production releases.
How You Will Succeed:
Automation and Self-Service Focus
-
Identify repetitive operational tasks and implement automation using Python, Airflow, Jenkins, or similar tools.
-
Enable self-service capabilities and alerting for platform users and stakeholders.
AI-Ready Operations Mindset
-
Explore and propose how AI can be used to detect anomalies, predict issues, and accelerate root cause analysis.
-
Collaborate with internal teams to experiment with LLMs, bots, or ML models for improving operational efficiency.
-
Stay informed on emerging AIOps tools and work toward integrating them gradually.
Continuous Optimization
-
Monitor pipeline performance and costs, and implement changes that optimize compute, memory, and storage usage.
-
Recommend and trial AI/ML-based approaches for pipeline tuning, scheduling, or resource allocation.