Senior Site Reliability Engineer
nordex
Job Description
- Strong understanding of Java applications, JVM behavior, memory management, garbage collection, and tuning
- Ability to read and debug Java services to support incident response
- Strong experience deploying, operating, and debugging workloads on Microsoft Azure, including:
- Azure Kubernetes Service
- Azure Virtual Machines
- Azure Application Gateway / Load Balancers
- Azure Monitor, Log Analytics, Alerts, Dashboards
- Azure Key Vault
- Azure Networking basics (VNets, subnets, NSGs, Private Endpoints)
- Experience with observability stacks such as Prometheus + Grafana, OpenTelemetry, ELK, Loki, or Azure-native logging
- CI/CD pipelines for Java applications
- Canary releases, rolling updates, blue/green deployments
- Automated rollback mechanisms
- Artifact storage and versioning
- Experience defining SLIs, SLOs, and SLAs for Java services
- Strong communication during incidents (clear, calm, structured)
- Ability to collaborate with Java developers, DevOps, and platform teams
- Documentation writing (runbooks, RCAs, reliability guidelines)
- Continuous improvement mindset