Engineer, Site Reliability
ses
Job Description
PRIMARY RESPONSIBILITIES / KEY RESULT AREAS
- Ensure system reliability and availability by maintaining infrastructure environments and ensuring software applications run smoothly.
- Monitor system health continuously to detect issues and automatically handle failures.
- Standardize and automate processes, while maintaining internal tools and documentation to support the overall infrastructure and reduce emergency response time. This includes troubleshooting workflows, scripting, tooling, and documentation.
- Mitigate operational risks by identifying, assessing, and implementing measures to eliminate potential risks that could impact system performance.
- Develop and deploy solutions using industry best practices / innovation as well as SES IT defined architecture standards and guidelines.
- Deploy and configure cloud services using the Cloud Center of Excellence methodology as needed.
- Collaborate with various teams to ensure seamless integration and operation of systems.
- Drive improvement and adherence to all IT service management processes, including incident management, change management, request fulfilment, problem management.
- Create knowledge base articles and wikis for common issues and daily operations procedures.
- Stay current with the latest IT and Cloud technologies.
- Provide support and drive strong focus on simplification, standardization and automation.
- Provide inputs to Department heads for administrative and management tasks.
- Fulfil on-call duties on a rotating 24/7 schedule.
COMPETENCIES
- Self-driven innovative mindset, commitment to excellence and quality in all areas.
- Strong knowledge of on-premises systems and infrastructure.
- Excellent problem-solving and troubleshooting skills.
- Good understanding of cloud environments and services.
- Ability to work collaboratively with cross-functional teams.
- Results-driven with a pragmatic service delivery approach.
- Thrives in fast-paced environments with tight deadlines, maintaining attention to detail and self-motivation.
- Strong communication and documentation skills.
- Proactive and self-motivated with a focus on continuous improvement.
QUALIFICATIONS & EXPERIENCE
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- Minimum 3 years of applicable professional experience in a large, diverse environment.
- Strong expertise in Linux OS, including system administration, shell scripting, performance tuning, security best practices, and experience with virtualization and cloud platforms.
- Strong expertise in one or more virtualization solutions (e.g., VMware, Nutanix).
- Proven experience in a similar role, with a focus on cloud and on-premises systems environments.
- Experience with automation tools and scripting languages.
- Good knowledge of Windows and Linux OS, EntraID, Active Directory, CI/CD automation, continuous monitoring and information security.
- Familiarity with ITIL processes and best practices.
- Fluency in English, any other language is considered as an asset.
- Relevant certifications (e.g., AWS, Azure, Google Cloud, RHCE, LFCS) are a plus.