Site Reliability Engineer
lilly
Job Description
What You Should Bring:
-
System Reliability & Performance: Maintain high availability and performance of production systems.
-
Incident Management & Response: Monitor system health, troubleshoot issues, and respond to incidents to minimize downtime.
-
Automation & Infrastructure as Code (IaC): Develop scripts and automation tools for deployment, monitoring, and infrastructure management.
-
Strong problem solving and analytical skills and highly adaptable to changing circumstances.
-
Experience in reliability engineering and monitoring practices, environments, and tools (e.g., cloud ecosystem preferably AWS, monitoring/observability, configuration management, etc.).
-
Experience with programming and scripting languages (e.g., Java, Javascript, Python, etc.) within context of automation tools
-
Experience with large-scale databases, data movement and analytics tools (e.g., RDS, DynamoDB, etc.)
-
Experience with ITIL v4 processes, framework, and tools that support it (e.g., ServiceNow).
-
Experience with testing frameworks and methodologies (unit, integration, and end-to-end testing) for ensuring application quality and reliability. Knowledge of debugging and performance optimization tools for diagnosing and resolving issues.
-
Proficient understanding of code versioning tools and continuous integration (CI/CD).
-
Participate in code reviews and provide constructive feedback to improve code quality and maintainability.
- Troubleshoot and resolve front-end issues, optimizing performance, and continuously improving the user experience. Ability to perform root cause analysis (using techniques like 5 Whys, fishbone diagrams, or fault tree analysis) to identify and address the underlying causes of quality issues.
- Demonstrate continuous improvement experience with automation or applying technology to improve performance.
Basic Qualifications:
-
Bachelor’s Degree in Computer Science, Information Technology or related technical field
-
8+ years of overall experience with atleast 5+ years experience in Full Stack Software Engineering including:
-
Experience with programming (Java, JavaScript ,Go, Python,)
-
Experience with Containers, Databases (SQL/No SQL)
-
Experience with CI/CD, Cloud and GitHub technologies
-
Experience with Automation tools
-
Experience in configuration management (Terraform, Ansible, Puppet, Chef)
-
Experience in monitoring/logging tools (Prometheus, Grafana, ELK, Datadog)
-