Site Reliability Engineer
zf
Job Description
What you can look forward to as Site Reliability Engineer (SRE) (m/f/d):
- Implement and maintain our monitoring system. Take ownership as necessary during system outages and incidents. Conduct root cause analysis of system failures and implement fixes. Participate in incident reviews
- Plan, execute and test software updates & Automate infrastructure management tasks and processes
- Respond on-call to incidents with quick and effective resolutions
- Being customer obsessed, having engineering mindset in troubleshooting
- Looking for continuous improvement and drive for operational excellence
- Skills: Linux/Unix/Windows administration, Cloud platforms (AWS), Scripting languages (Python, Shell, PowerShell), Configuration management tools (Puppet, Ansible, Terraform), CI/CD tools (Jenkins, Gitlab CI CD), Monitoring tools (Grafana, Dynatrace, Icinga), Database (SQL Server, MongoDB)
Your profile as Site Reliability Engineer(SRE) (m/f/d) :
- 4-7 years of experience as a Cloud native production environment Site Reliability Engineer (preferably AWS) supporting high-availability large-scale web-based applications
- Good knowledge on server operating systems (Linux/Windows)
- Awareness of server deployment, management and patching tools.
- Experience in Monitoring tools like Dynatrace/Nagios (Icinga)/ Grafana/ CloudWatch/Elastic search
- Foundational experience with Networking - Routers, Switches, Firewalls and Load balancers
- Working knowledge of managing and supporting live 24x7 applications & Working experience on managing Databases (SQL Server/MongoDB)
- Experience in system automation tools like Puppet, ansible, terraform, CI/CD tooling& Experience with at least one scripting language (Python/PowerShell/Shell)