Site Reliability Engineer
thomsonreuters
Job Description
-
About the role:
In this opportunity, you will:
- Be a Professional SRE: Implement site reliability engineering and DevOps best practices. Feed non-functional requirements into the product backlog, such as, but not limited to, high availability, scalability, self-healing, observability, continuous delivery, security
- Build and maintain monitoring for all aspects of infrastructure, micro-services and the platform and implement Alerting mechanism using cloud native solutions
- Provide primary operational support and engineering for distributed platforms
- Act as the go to person for any production issue. Troubleshoot and monitor until successful mitigation, communicate effectively, postmortem and implementation of the learnings.
- Maintain IaC and CICD and promote best practices for our CI/CD processes
- Focus on Continuous improvement and technical standards – drive improvements in productivity, monitoring, tooling and set industry best practices.
- On-call Rotation: Participate in on-call/shift rotations.
-
When on-call, you are expected to drive the troubleshooting and mitigation activities while working on incident
- Be innovative and curious:
- Maintain end-to-end security ensuring that we meet best practices standards
- Keep up-to-date with emerging cloud technology trends, especially around DevOps, Service Reliability and Security.
- Adopt pan-TR operation principles to ensure consistency and efficiency
- Documenting “tribal” knowledge. Constant upkeep of documentation and runbooks can ensure that teams get the information they need right when they need it
- Be collaborative:
- Extreme collaboration within our teams – Canada, US, Mexico and India
-
About you:
You're a fit for the role if you have:
- Bachelor’s degree in computer science or related field - a must
- Minimum of 6+ years of experience as DevOps/SRE engineer and Cloud engineer with hands-on experience in AWS cloud technologies.
- Highly skilled in UNIX/Linux-based Systems
- Proven experience in building and operating PRODUCTION cloud-native infrastructure, applications, and services on AWS.
- Experience or knowledge of Container technology such as Docker, Kubernetes and Istio service mesh
- Must have experience using AWS services (such as Cloud Front, EKS, ECS, RDS, Threat detection and other security controls)
- Must have 2+ years scripting and programming experience (PowerShell, Bash)
- Experience or knowledge of Observability tools: DataDog, ELK, SumoLogic, CloudWatch
- Experience or knowledge with Version Control and CI/CD (Git/ Azure DevOps / JFrog Artifactory)
- Experience or knowledge writing Infrastructure as Code (IaC) (Terraform / CloudFormation / other)
- Team player with a can do attitude