Architect - Site Reliability Engineer- Compute

pepsicojobs

Hyderabad 9 Years Exp Posted 557d ago

Job Description

Responsibilities

  • Monitor and Respond: Proactively monitor compute infrastructure health and performance, identify potential issues, and respond quickly to incidents.
  • Automate and Optimize: Develop and implement automation tools to streamline compute operations, improve efficiency, and reduce manual intervention.
  • Collaborate and Troubleshoot: Work closely with software engineering, platform, and other teams to troubleshoot complex compute problems and implement solutions.
  • Capacity Planning: Analyze compute resource usage and trends to forecast capacity needs and ensure sufficient resources are available to meet demand.
  • Document and Communicate: Maintain accurate and up-to-date documentation of compute configurations, procedures, and incidents.
  • Participate in On-Call Rotation: Provide 24/7 on-call support for critical compute incidents.

Qualifications

  • Experience: 9+ years of experience in systems engineering or operations, with a focus on SRE principles and practices.
  • Technical Skills: Deep understanding of operating systems (Linux, Windows), virtualization technologies, Storage and Back Up systems including container orchestration platforms (Kubernetes, Docker).
  • Problem-Solving: Strong analytical and problem-solving skills, with the ability to identify and resolve complex compute issues.
  • Communication: Excellent written and verbal communication skills, with the ability to collaborate effectively with cross-functional teams.  
  • Adaptability: Ability to thrive in a fast-paced, dynamic environment, and adapt to changing priorities.

Similar Openings for You