Advanced Cloud Operations Engineer
oraclecloud
Job Description
Provides comprehensive day‑to‑day monitoring and operational support for all ACDE Azure Cloud Infrastructure environments, including all associated subscriptions, resources, certificates, and connectivity components. Ensures continuous oversight of system health by monitoring alert dashboards, reviewing critical event and system logs, and proactively identifying, troubleshooting, and resolving operational issues.
Oversees and maintains reliable communication pathways between DN IT environments and Azure private VNets, ensuring secure and efficient routing of data across all ACDE services. Maintains responsibility for performance, stability, and 24×7 availability of cloud-hosted solutions.
Manages capacity by performing up‑ and down‑scaling activities based on system load requirements. Executes deployments of technical changes, software updates, and cloud infrastructure enhancements. Administers certificate lifecycle management, including issuance, renewal, and rotation, to maintain security and compliance across the ACDE ecosystem.
Supports sprint-based development activities by providing cloud infrastructure monitoring, resource readiness, and operational enablement for evolving feature releases.
Responsibilities
- Monitor and support all ACDE Azure Cloud Infrastructure environments, including subscriptions, resources, certificates, data routing, and private VNet connectivity.
- Proactively analyze alerts and logs to identify, troubleshoot, and resolve operational and performance issues, ensuring 24×7 system availability.
- Collaborate with ACDE developers, data analysts, product teams, and security stakeholders to design cost‑effective, secure, and scalable cloud solutions.
- Lead incident resolution efforts, performing root‑cause analysis and implementing sustainable fixes.
- Deploy and configure cloud infrastructure and applications through automation tools including certificate lifecycle tasks.
- Implement code‑level changes for custom automations and operational tooling that support sprint development and ongoing feature readiness.
- Manage capacity and optimize cloud performance through scaling, resource tuning, and continuous evaluation of system health and cost efficiency.
- Support AKS (Azure Kubernetes Service) workloads, including Docker‑based deployments, troubleshooting, and performance optimization.
- Recommend improvements for security, reliability, and cost savings, evaluating new Azure services and technologies for ACDE adoption.
- Ensure reliable communication and secure data routing between DN IT environments and Azure private VNets, maintaining connectivity, performance, and security across ACDE systems.