Site Reliability Engineer
Capgemini
Job Description
Your Role
- Design, implement, and maintain highly available systems on cloud platforms.
- Develop Infrastructure as Code (IaC) using Terraform, ARM templates, or Bicep.
- Implement monitoring, alerting, and observability solutions.
- Drive incident management, root cause analysis, and postmortem processes.
- Automate operational tasks using PowerShell, Python, or Bash.
- Collaborate on CI/CD pipelines using Azure DevOps or GitHub Actions.
- Optimize cost, performance, and security of cloud infrastructure.
- Champion SRE best practices like SLIs/SLOs and capacity planning.
- Participate in on-call rotations for production support.
Your Profile
- Good experience in SRE, DevOps, or Cloud Infrastructure roles.
- Hands-on experience with Microsoft Azure or equivalent (AWS/GCP/OpenShift).
- Proficiency in IaC tools (Terraform, ARM, Bicep).
- Strong knowledge of CI/CD, automation, and deployment strategies.
- Expertise in monitoring tools (Azure Monitor, Prometheus, Grafana).
- Scripting skills in PowerShell, Python, or Bash.
- Knowledge of containers and orchestration (Docker, Kubernetes).
- Familiarity with cloud security and compliance.
- Excellent problem-solving and communication skills.