Site Reliability Engineer
thomsonreuters
Job Description
- Lead proactive monitoring and health management for production and non-production environments; identify options for problem resolution and initiate appropriate actions.
-
Conduct and contribute to root-cause analysis of complex system/application issues; document corrective and preventive actions.
-
Plan and execute standard installations, upgrades, configurations, and maintenance activities; provide input to technical plans and solutions.
-
Develop, configure, and support tooling for system monitoring and troubleshooting; contribute automation to improve repeatability and time-to-restore.
-
Liaise with application development, content, customer service, and other software/hardware support teams to manage escalations and coordinate change.
-
Maintain accurate, auditable records for change, deployment, and security across environments.
About You:
-
Have 4 to 6 years' experience with PowerShell and/or Python, with confidence using the command line (Bash) for diagnostics and tooling.
-
A solid understanding of Windows Server administration, including configuration, security, and maintenance tasks.
-
Experience in using and integrating APIs, particularly in configuring and employing them effectively within business processes.
- Adequate familiarity with deploying, managing, and troubleshooting applications and services on Microsoft Azure.