CloudOps - Monitoring Engineer
ey
Job Description
Your key responsibilities
- Monitor cloud infrastructure and platform health using predefined observability tools
- Acknowledge and validate alerts from monitoring tools and determine the next course of action
- Follow standard operating procedures (SOPs) and runbooks for initial triage before escalating
- Reassign unresolved incidents and requests in ITSM tool to the correct product teams
- Ensure accurate ticket documentation with timestamps, initial analysis, and handoff details
- Communicate clearly with L2/L3 teams and stakeholders to facilitate smooth issue escalation
- Perform shift handovers with detailed updates to ensure continuity in monitoring
Skills and attributes for success
- Basic understanding of cloud services such as AWS (EC2, S3, IAM, Secrets Manager).
- Familiarity with observability tools like Prometheus, Grafana, Datadog, OpenTelemetry (OTEL), Splunk, AWS CloudWatch.
- Ability to strictly follow SOPs and predefined triage steps without deviation.
- Strong attention to detail for accurate documentation and handoffs.
- Good communication skills to effectively collaborate with L3 teams and stakeholders
To qualify for the role, you must have
- 0-2 years of experience in cloud infrastructure or application monitoring.
- Basic understanding of cloud platform - AWS and operating systems (Windows/Linux).
- Knowledge of monitoring tools like Azure Monitor, AWS CloudWatch, or third-party tools such as New Relic, Datadog, or Grafana.
- Experience working with ITSM tools.
- Basic knowledge of DevOps tools such as Jenkins, Kubernetes, and Git.