Site Reliability Engineer (SRE)
Siemens
Job Description
Responsibilities/Tasks
Provide & own the design, deployment, automation, and scripting solutions to drive new capabilities, visibility, and efficiency
Collaborate with other technical platforms and partners to engineer automated and integrated solutions between tools, services, teams that increase availability, reliability, and performance.
Own and ensure the internal and external SLA’s meet and exceed expectations
Be part of maintaining a 24x7, global, highly available SaaS environment
Participate in an on-call rotation that supports our production infrastructure
Troubleshoot production availability incidents that often span across multiple teams and services.
Lead production incident post-mortems, and contribute to solutions to prevent problem recurrence; with the goal of automated response to all non-exceptional service conditions
Communicate to business and technical partners on incidents as they occur when they impact system performance or availability at a critical level
Required Education, and Experience
- Education: Bachelor’s Degree or equivalent experience with at least two years in IT.
- Experience:
- Automation and Scripting: Over 4 years of experience in automation, including scripting and API development.
- Cloud Software Development: At least 3 years of experience in software development in cloud environments.
- Observability Tools: A minimum of 2 years of experience with observability tools such as Datadog, CloudWatch, CloudTrail, Elastic Stack, Grafana, or similar tools.
Over 2 years of experience with containerization, specifically Kubernetes
2+ years of expertise in Amazon Web Services (AWS) services
2+ years of expertise with tools such as Terraform, CloudFormation, Ansible, or similar.
2+ years proficiency with Python
Preferred Knowledge/Skills
**Siemens Teamcenter software**
Desired certifications include:
Datadog, Kubernetes, AWS or Azure certification
More than 2 years of proficiency as a Site Reliability Engineer or equivalent role
2+ years experience with issue/incident tracking tool
(ServiceNOW, ServiceDesk, Jira or equivalent tools)
2+ years with log management tools (ie ELK Stack)
2+ years experience Enterprise IT environment with distributed environments
Networking concepts, including firewalls, VPN, routing, load balancers, security and DNS
Senior level system administration experience, including troubleshooting, support, mentorship/training, and oversight Attachments
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status.