Cloud Engineer
chevron
Job Description
Key responsibilities:
-
Design, implementation, and support of large compute clusters
-
Administration of on-prem and cloud-hosted Linux systems
-
Management and optimization of Slurm Workload Manager
-
Implementation of automation solutions for system management, maintenance, and monitoring
-
Conduct troubleshooting, incident management, and problem management
-
Level 3 end-user support (technical escalation)
-
Conduct performance monitoring and optimization process
-
Support Lustre parallel file system
-
Ensure all technical solutions are aligned with the vision, principles, architecture, and standards defined by the relevant architecture teams while ensuring information protection and data privacy
Required qualification:
-
Knowledge and at least 5 years experience in Linux, Cloud, and storage system administration experience in a large-scale enterprise environment
-
Knowledge of one or more of the following areas: HPC job scheduling systems (e.g., Slurm or PBS), parallel file systems (e.g., Lustre), Azure VM Scale Sets, underlying infrastructure supporting Oil and Gas applications, and configuration management technologies (e.g., Satellite, Ansible, Python, and Azure)
-
Bachelor's degree in Computer Science, Information Systems, or comparable field