Engineer, Site Reliability Engineering
myworkdayjobs
Job Description
-
Support reliability operations across platforms.
-
Monitor and maintain SLOs, SLIs, and error budgets.
-
Participate in incident response and post-incident reviews; contribute to root cause analysis and remediation.
-
Develop automation for operational tasks, incident response, and compliance.
-
Maintain and enhance CI/CD pipelines with integrated testing and deployment automation.
-
Implement observability dashboards and alerts using Datadog, OpenTelemetry, and BigPanda.
-
Contribute to infrastructure-as-code using Terraform and GitHub Actions.
-
Support integration and maintenance of Kong API Gateway and Snowflake data platform.
Service Management & Compliance
-
Follow ITIL practices for incident, problem, change, and service request management.
-
Use ServiceNow for ticketing, reporting, and workflow automation.
-
Ensure runbook accuracy and DR readiness.
-
Monitor system performance and cost efficiency.
-
Support compliance and audit readiness activities.
Collaboration & Knowledge Sharing
-
Work with engineering and product teams to embed reliability into delivery.
-
Share technical knowledge through documentation and enablement sessions.
-
Participate in global SRE initiatives and cross-regional collaboration.
Person Specification
-
Bachelor’s degree in Computer Science, Engineering, or a related technical field or equivalent practical experience.