Site Reliability Engineer
lilly
Job Description
How You’ll Succeed:
-
As a Platform Leader, you will collaborate within Automation Center of Excellence product team(s) to promote the concept of reliability engineering during all phases of the software lifecycle to detect and correct performance issues across products, platforms, and components.
-
With an approach that accepts failure, and ability to work in ambiguity, you will use your object-oriented development skills to identify, assess, and reduce support footprint by recommending and implementing automation and observability practices.
-
By proactively influencing development to enable long-term growth, you will lead the organization in engaging in new development to ensure optimization in end-to-end application lifecycle support. To do this you will be accountable for monitoring service level objectives (SLOs) to support system operations and ROI through dashboard development and KPI tracking.
-
As a leader in the organization, you play an important role in defining and championing reliability engineering practices among engineering and product teams, participating in communities of practice and mentoring others through collaboration.
-
Using your problem-solving skills, you will create software, workflows, and utility scripts as published reusable components to broaden impact through federation across all Lilly business areas.
-
As a superb communicator with strong interpersonal skills, you will work and collaborate effectively on a team including with remote team members, to define, design and deliver new features, ensuring execution and implementation support the enterprise roadmap.
What You Should Bring:
-
System Reliability & Performance: Maintain high availability and performance of production systems.
-
Incident Management & Response: Monitor system health, troubleshoot issues, and respond to incidents to minimize downtime.
-
Automation & Infrastructure as Code (IaC): Develop scripts and automation tools for deployment, monitoring, and infrastructure management.
-
Strong problem solving and analytical skills and highly adaptable to changing circumstances.
-
Experience in reliability engineering and monitoring practices, environments, and tools (e.g., cloud ecosystem preferably AWS, monitoring/observability, configuration management, etc.).
-
Experience with programming and scripting languages (e.g., Java, Javascript, Python, etc.) within context of automation tools
-
Experience with large-scale databases, data movement and analytics tools (e.g., RDS, DynamoDB, etc.)
-
Experience with ITIL v4 processes, framework, and tools that support it (e.g., ServiceNow).
-
Experience with testing frameworks and methodologies (unit, integration, and end-to-end testing) for ensuring application quality and reliability. Knowledge of debugging and performance optimization tools for diagnosing and resolving issues.
-
Proficient understanding of code versioning tools and continuous integration (CI/CD).
-
Participate in code reviews and provide constructive feedback to improve code quality and maintainability.
- Troubleshoot and resolve front-end issues, optimizing performance, and continuously improving the user experience. Ability to perform root cause analysis (using techniques like 5 Whys, fishbone diagrams, or fault tree analysis) to identify and address the underlying causes of quality issues.
- Demonstrate continuous improvement experience with automation or applying technology to improve performance.
Basic Qualifications:
-
Bachelor’s Degree in Computer Science, Information Technology or related technical field
-
8+ years of overall experience with atleast 5+ years experience in Full Stack Software Engineering including:
-
Experience with programming (Java, JavaScript ,Go, Python,)
-
Experience with Containers, Databases (SQL/No SQL)
-
Experience with CI/CD, Cloud and GitHub technologies
-
Experience with Automation tools
-
Experience in configuration management (Terraform, Ansible, Puppet, Chef)
-
Experience in monitoring/logging tools (Prometheus, Grafana, ELK, Datadog)
-