Cloud Engineer (Azure)
oraclecloud
Job Description
- Design, deploy, and operate highly available, scalable, and secure Azure cloud infrastructure supporting business applications
- Establish Azure reference architectures, best practices, and "golden path" patterns for workload onboarding aligned with existing AWS standards
- Implement Infrastructure as Code using Terraform Enterprise (modules, state management, workspaces) and Terraform Enterprise for Azure resources; maintain parity with AWS IaC patterns where applicable
- Architect and implement Azure Kubernetes Service (AKS) clusters with a focus on security, RBAC, networking, autoscaling, and operational excellence
- Collaborate with Cloud Architecture, Cloud Security, Networking, Identity, Database, and Unix teams to integrate Azure workloads with existing infrastructure and security controls
- Develop and implement Azure governance frameworks: subscription design, management groups, Azure Policy, role-based access control (RBAC), and cost management
- Build and maintain CI/CD pipelines for Azure deployments using GitHub, Bamboo, GitLab, ArgoCD, and Azure DevOps
- Provide consulting and technical expertise to application teams on securely integrating Azure services (Azure Functions, Logic Apps, Service Bus, Cosmos DB, Azure SQL, etc.)
- Troubleshoot complex issues spanning Azure infrastructure, AKS, networking (VNet, NSG, Application Gateway, Azure Front Door), and identity (Azure AD, managed identities)
- Drive compliance and security efforts: implement secure architectures, maintain audit readiness, and support regulatory requirements
- Create and maintain Azure-specific documentation, runbooks, and operational procedures
- Identify opportunities for automation and toil reduction; implement GenAI-assisted workflows where appropriate
- Participate in on-call rotation as needed to support Azure infrastructure and workloads
- Lead or contribute to incident response for Azure-related issues; conduct post-incident reviews and track corrective actions
- Develop runbooks and standard operating procedures for Azure operations; support knowledge transfer to global teams
- Define and monitor SLOs/SLIs for Azure services; drive continuous improvement in reliability and performance
- Maintain compliance with change management, documentation, and security standards