Site Reliability Engineer (SRE)
Siemens
Job Description
Responsibilities/Tasks
Provide & own the design, deployment, automation, and scripting solutions to drive new capabilities, visibility, and efficiency
Collaborate with other technical platforms and partners to engineer automated and integrated solutions between tools, services, teams that increase availability, reliability, and performance.
Own and ensure the internal and external SLA’s meet and exceed expectations
Be part of maintaining a 24x7, global, highly available SaaS environment
Participate in an on-call rotation that supports our production infrastructure
Troubleshoot production availability incidents that often span across multiple teams and services.
Lead production incident post-mortems, and contribute to solutions to prevent problem recurrence; with the goal of automated response to all non-exceptional service conditions
Communicate to business and technical partners on incidents as they occur when they impact system performance or availability at a critical level
Required Education, and Experience
- Education: Bachelor’s Degree or equivalent experience with at least two years in IT.
- Experience:
- Automation and Scripting: Over 4 years of experience in automation, including scripting and API development.
- Cloud Software Development: At least 3 years of experience in software development in cloud environments.
- Observability Tools: A minimum of 2 years of experience with observability tools such as Datadog, CloudWatch, CloudTrail, Elastic Stack, Grafana, or similar tools.
Over 2 years of experience with containerization, specifically Kubernetes
2+ years of expertise in Amazon Web Services (AWS) services
2+ years of expertise with tools such as Terraform, CloudFormation, Ansible, or similar.