Site Reliability Engineer

thomsonreuters

Hyderabad 3 Years Exp Posted 609d ago

Job Description

In this opportunity as Site Reliability Engineer, you will:

Work with application teams to manage and support applications into production
Continuous improvement to an on-going support model including release and change management for maintaining the strategic environments (i.e. production, non-production etc.)
Provide well-written documentation and technical presentations on projects supported by the team.
Provide problem management services by utilizing diagnostic and debugging tools to aid in troubleshooting efforts, including 24x7 rotating pager support.
Coordinate the implementation of application monitoring, establish support documentation, and provide training on products and procedures.
Provide technical assistance on the troubleshooting, and performance tuning of the supported environment(s)

About You
You're a fit for the role of Site Reliability Engineer if your background includes

3-5 years of experience in an enterprise-level operations support role, SRE, or DevOps role.
Working knowledge of infrastructure components (e.g., routers, load balancers, cloud products, container systems, compute, storage, and networks)
Expertise in observability and monitoring tools, like Datadog, AppDynamics, Splunk, etc.
Deep understanding of Application performance monitoring (APM) and user monitoring.
Knowledge of Infrastructure as Code (IaC): AWS Cloud Formation, Ansible, Terraform, etc. Apply standards of cloud compliance to application design to achieve reliability
Experience in site reliability engineering in Dotnet, Java, Kubernetes, and Database platforms (like Postgres)
Experience with Load balancers and AWS services such as AWS ECS, EMR, State Machines/ Step Functions, CloudFormation, CloudWatch, Lambda, SQS, ECR, Fargate, Elastic Search, networking concepts, etc.,
Sound knowledge of ITSM process, SI/SLO/SLA management, incident resolution, and automation techniques
Strong IP networking fundamentals and experience with usage of standard application protocols and messages (e.g., TCP/IP, HTTP, SOAP, RESTful APIs, XML/JSON, JDBC, JMS/MQ)
Ability to analyze application and server logs, error interpretation.
Incident response and recovery: SREs are responsible for responding to incidents and implementing processes for incident response, monitoring, and automated recovery.
Scripting knowledge in Poweshell, Bash, shell scripting
Ability to code in one of the programming languages (Java, C#, Python, JavaScript, etc.)
Working knowledge of ITIL Change and Incident management processes.
Excellent written and verbal communication skills and strong collaboration skills.

Site Reliability Engineer

Job Description

Similar Openings for You

DevOps Lead.

Senior DevOps Engineer

DevOps Engineer III

DevOps Engineer II