Azure Cloud SRE
Capgemini
Job Description
Primary Skills
- Deep understanding of Microsoft Azure services including Azure Virtual Machines, Azure Kubernetes Service (AKS), Azure App Services, Azure Functions, Azure Storage, and Azure Networking.
- Experience in designing and implementing highly available, scalable, and secure cloud infrastructure using Azure best practices.
- Proficiency in Infrastructure as Code (IaC) using tools like Terraform or Bicep for automating Azure resource provisioning.
- Strong knowledge of CI/CD practices and tools, especially Azure DevOps, for automating application and infrastructure deployments.
- Expertise in monitoring, logging, and alerting using Azure Monitor, Log Analytics, Application Insights, and integrating with tools like Grafana or Prometheus.
- Experience with incident response, root cause analysis, and implementing post-incident reviews to improve system reliability.
- Familiarity with SRE principles such as SLIs, SLOs, and error budgets, and applying them in real-world scenarios.
- Proficient in scripting and automation using PowerShell, Bash, or Python to streamline operational tasks and incident handling.
- Understanding of security and compliance in Azure environments, including identity and access management (IAM), Azure Policy, and Azure Key Vault.
- Experience with containerization and orchestration technologies, especially Docker and Kubernetes (AKS), for managing microservices-based applications.
Secondary Skills
- Exposure to configuration management tools like Ansible, Chef, or Puppet.
- Experience with cost optimization strategies and tools in Azure.
- Familiarity with Git-based workflows and version control systems.
- Knowledge of hybrid cloud and multi-cloud environments.
- Understanding of Agile and DevOps methodologies.
- Experience with service mesh technologies like Istio or Linkerd.
- Certifications such as Microsoft Certified: Azure Administrator Associate, Azure Solutions Architect Expert, or Google SRE Certificate.