Site Reliability Engineer
Siemens
Job Description
Responsibilities/Tasks for job opportunity
- Lead the design, deployment, automation, and integration of scripting solutions to enhance capabilities, visibility, and efficiency.
- Collaborate with leaders across technical platforms and partners to engineer automated, integrated solutions that improve tool, service, and team interactions, increasing availability, reliability, and performance.
- Oversee and ensure that both internal and external SLAs consistently meet or exceed expectations.
- Continuously review and refine SRE standards, processes, and standard practices, particularly in incident response and toil reduction.
- Manage a team of engineers participating in a 24/7 on-call rotation to support our production infrastructure.
- Join incident calls that exceed acceptable duration.
- Ensure comprehensive post-mortem analysis of production incidents, driving continuous improvement initiatives.
Required Knowledge/Skills, Education, and Experience
- 7+ years of professional experience in SRE or DevOps, with 3+ years of experience in a leadership role.
- proven experience with automation via scripting & API development
- 2+ years experience with observability tools(Datadog, CloudWatch, Cloud-Trail, Elastic Stack, Grafana, or equivalent tools)
- 2+ years experience with containerization, specifically Kubernetes
- 2+ years experience with Amazon Web Services (AWS) services
- 2+ years experience Terraform, CloudFormation, Ansible, or equivalent tools
- 2+ years experience with issue/incident tracking tool
Preferred Knowledge/Skills, Education, and Experience
- Familiarity with agile methodologies and experience working in an Agile/Scrum environment.
- Desired certifications include: Datadog, Kubernetes, AWS or Azure certification
- 2+ years experience as a Site Reliability Engineer or equivalent role (ServiceNOW, ServiceDesk, Jira or equivalent tools)
- 2+ years with log management tools (ie ELK Stack)
- 2+ years experience Enterprise IT environment with distributed environments
- Senior level system administration experience, including troubleshooting, support, mentorship/training, and oversight