Site Reliability Engineer
Capgemini
Job Description
Job Description
- Design, implement, and maintain scalable and reliable compute infrastructure, with a focus on Wintel, Linux, VMWare, and Redhat KVM environments.
- Collaborate with development teams to ensure applications are designed for reliability and performance across different operating systems and virtualization platforms.
- Automate repetitive tasks to improve efficiency and reduce manual intervention, specifically within Wintel and Linux systems.
- Monitor system performance, identify bottlenecks, and implement solutions to improve overall system reliability in VMWare and Redhat KVM environments.
- Develop and maintain tools for deployment, monitoring, and operations tailored to Wintel, Linux, VMWare, and Redhat KVM.
- Troubleshoot and resolve issues in development, test, and production environments, focusing on compute-related challenges.
- Participate in on-call rotations and respond to incidents promptly, ensuring high availability of compute resources.
- Implement best practices for security, compliance, and data protection within Wintel, Linux, VMWare, and Redhat KVM systems.
- Document processes, procedures, and system configurations specific to the compute infrastructure.
Primary Skills
- Site Reliability Engineer SRE
- Compute Infrastructure
- Wintel Administration
- Linux Administration
- VMWare Administration
- Redhat
- Proficiency in scripting languages Python, Java, C/C++, Bash
- Infrastructure tools Terraform, Ansible
- Experience with monitoring and logging tools Prometheus, Grafana, ELK stack
- Solid understanding of networking, security, and system administration within Wintel and Linux environments.
- Experience with CI/CD pipelines and tools Jenkins, GitLab CI
- Knowledge of database management systems MySQL, PostgreSQL