Sr. Site Reliability Engineer

sirion

Gurgaon 5 Years Exp Posted 477d ago

Job Description

What You’ll Do:

  • System Monitoring and Incident Management: Monitor the health and performance of critical systems, applications, and services. Respond to incidents, troubleshoot issues, and ensure timely resolution to minimize downtime and service disruptions.
  • Automation and Scripting: Develop and maintain automation scripts and tools to streamline operational tasks, deployment processes, and infrastructure management.
  • Infrastructure Management: Manage and scale the underlying infrastructure, including servers, cloud services, and network components. Implement best practices for configuration management, monitoring, and disaster recovery.
  • Release Management: Collaborate with development teams to ensure smooth and reliable software releases. Participate in the design and implementation of deployment strategies.
  • Performance Optimization: Identify performance bottlenecks and optimize the system to improve reliability and response times.
  • Capacity Planning: Analyze system capacity and plan for future growth to meet increasing demands.
  • Security and Compliance: Implement security best practices and ensure compliance with relevant industry standards and regulations.
  • Collaboration and Documentation: Work closely with cross-functional teams, including developers, product managers, and operations, to ensure efficient communication and knowledge sharing. Document processes, procedures, and troubleshooting guides.
  • On-Call Support: Participate in an on-call rotation to handle urgent issues and incidents outside regular business hours.

 

What You’ll Need:

 

  • Experience with Cloud Technologies: Proficiency in working with one or more cloud platforms like AWS, Google Cloud Platform, or Microsoft Azure.
  • Programming and Scripting Skills: Strong knowledge of at least one scripting experience with shell scripting.
  • System Administration: Linux/Unix system hands on and good to have administration and networking concepts.
  • Monitoring and Logging: Experience with monitoring tools such as Prometheus, Grafana, Nagios, and log management solutions like ELK stack.
  • Version Control: Familiarity with version control systems like Git.
  • Problem-Solving Skills: Ability to analyze and troubleshoot complex technical issues.
  • Communication Skills: Strong verbal and written communication skills to collaborate effectively with team members and stakeholders.
  • Any hands-on individual with BCA/MCA and B.Tech background.

Similar Openings for You