NOC Engineer
bridgenext
Job Description
- Monitor applications and infrastructure using New Relic, Datadog, Grafana and related observability tooling; maintain dashboards and actionable alerting.
- Alert creation, tuning, and noise reduction
- Provide L1/L2 incident response in a 24×7 environment; triage alerts, restore service quickly, and manage escalations.
- Perform deep troubleshooting across Linux systems, Kubernetes workloads, infrastructure components, and network paths.
- Conduct log analysis using Newrelic/ELK (and/or similar platforms) to identify patterns, correlate events, and support root cause analysis.
- Build and enhance automation for routine operational tasks, alert remediation, and reporting using Python and Bash.
- Manage infrastructure changes using Terraform and follow Infrastructure-as-Code practices (review, version control, rollback readiness).
- Support Kubernetes platform operations by assisting with deployments, performing cluster/service health checks, executing scaling and recycling activities, monitoring capacity and performance, and troubleshooting issues.
- Maintain clear runbooks, SOPs, and shift handover notes; ensure knowledge is captured and reusable.
- Partner with engineering and cloud/infrastructure teams to improve reliability through post-incident reviews, problem management, and continuous improvements to observability.