Senior Site Reliability Engineer
barry-callebaut
Job Description
MAIN RESPONSIBILITIES & SCOPE
- Ensure scalability, performance, and reliability of large-scale, cloud-based applications and infrastructure
- Establish monitoring and observability solutions and address performance bottlenecks, errors and other issues
- Develop and maintain automated deployment pipelines to facilitate seamless and efficient delivery of software updates while minimizing downtime
- Develop and implement strategies to enable zero downtime deployments
- Resolve incidents promptly to minimize service disruptions
- Create and enforce best practices and standards for the deployment and management of applications, databases, and other resources
- Work closely with cross-functional teams, including developers, DevOps engineers, and QA engineers, to drive continuous improvement and innovation
ESSENTIAL EXPERIENCE & KNOWLEDGE / TECHNICAL OR FUNCTIONAL COMPETENCIES
- Minimum of 10 + years of relevant experience
- Good knowledge of IT infrastructures, cloud operations, as well as the design, implementation, and management of highly available and scalable infrastructure
- Proficiency in Azure services, Terraform, observability tools, techniques for monitoring and troubleshooting distributed systems
- Experience with zero downtime deployment strategies and DevOps tools (e.g. Jenkins, CircleCI, Github)
- Independent and self-driven personality, taking responsibility and owning tasks
- Possesses good problem-solving skills and structured way of working
- Openness to try and learn new technologies and skills
- Good written and verbal communication skills, being able to communicate problems to non-technical audiences