Site Reliability Engineer III

khoros

India 4 Years Exp Posted 473d ago

Job Description

Responsibilities :

Manage environments on the Cloud.
Monitor, troubleshoot, and resolve issues related to infrastructure, applications, and services.
Monitor availability and maintain the systems in good health.
Implement automation tools and processes to improve efficiency and reliability.
Participate in on-call rotation and respond to incidents promptly.
Continuously evaluate and improve our systems and processes to enhance reliability and performance.
Document runbooks and procedures.
Work closely with 1st Level support groups as well as Development groups.
To follow departmental change management procedures in defining, planning, and implementing change so that service disruption is minimized and adherence to Service Level Agreements is ensured.
Perform the Incident root cause analysis.
Have the ability to run with projects/issues solo and work in a team environment.
Be a Team Player – work in a collaborative team-oriented environment, share information, respect diverse ideas, and interact with customers and, partner with cross-functional and remote teams.
Be Curious & Innovative – continuously update yourself with next-generation technology and development tools, and contribute to process development practices. Evaluate new technologies and software products to determine the feasibility and desirability of incorporating capabilities within the company's products.
Be Agile – with a strong sense of urgency and a desire to work in a fast-paced, dynamic environment to deliver solutions against strict timelines.

Requirements:

4+ years experience as an SRE in fast-paced and high-traffic environments.
Experience deploying and maintaining applications in any one of the clouds (AWS- must have, AZURE/ GCP- good to have)
Working knowledge of Linux and Windows operating systems
Working knowledge with any of the scripting languages - Shell, bash, python, PowerShell
Understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes).
Working knowledge with Jenkins, Ansible, Terraform, and ArgoCD (good to have)
Administration of databases (MS SQL, MongoDB, etc)
Extensive experience with some monitoring, logging, and observability tools ( Sumo, DD, AWS CloudWatch, AWS X-Ray, New Relic, Splunk, etc.)
Ability to debug issues and solve problems
Excellent problem-solving and communication skills.
Ability to work independently and collaborate effectively in a team environment.
Familiarity with agile development methodologies is a plus.

Site Reliability Engineer III

Job Description

Similar Openings for You

Senior Quality Assurance Engineer

QA Engineer

Manual Test Lead

Senior Quality Assurance Analyst