System Reliability Engineer - Senior Analyst
Deloitte
Job Description
Key Responsibilities:
·System Monitoring: Setting up and maintaining monitoring tools to ensure system health and performance.
·Performance Optimization: Tuning systems and applications for optimal performance and resource utilization.
·Automation: Developing scripts and tools to automate repetitive tasks and improve operational efficiency.
·Capacity Planning: Analysing system usage trends and forecasting capacity requirements to ensure scalability.
·Deployment Management: Managing deployment processes and ensuring smooth rollouts of new software and updates.
·Security Management: Implementing and maintaining security measures to protect systems and data.
·Documentation: Creating and updating documentation for systems, processes, and troubleshooting guides.
·Collaboration: Working closely with development teams to improve application reliability and performance.
·Continuous Improvement: Identifying areas for improvement in systems and processes and implementing changes to enhance reliability and efficiency.
Key Accountabilities:
·System Availability: High uptime and minimal service disruptions, indicating robust reliability.
·Incident Response Time: meeting SLAs in response to tickets and internal teams.
·Mean Time to Recovery (MTTR): Rapid recovery from incidents, ensuring quick restoration of services.
·Automation Effectiveness: High percentage of automated tasks, reducing manual effort and human error.
·Performance Metrics: Improved system performance metrics, such as response times and resource utilization.
·Scalability: Ability to handle increased traffic and workload without degradation in performance.
·Security Posture: Maintaining a secure environment with no significant security breaches or vulnerabilities.
·Documentation Quality: Comprehensive and up-to-date documentation that aids in troubleshooting and knowledge sharing.
·Collaboration: Positive feedback from development teams and stakeholders on effective collaboration and support.
·Continuous Learning: Demonstrated adoption of new technologies and best practices to enhance operational efficiency and reliability.
Work Location: Hyderabad
Shift Timings: 06.30 AM to 03.30 PM
Qualifications
- A minimum of 3-5 years professional experience as an SRE, preferably with a focus on Generative AI
- Bachelor’s / master’s degree in computer science, Engineering, Mathematics, or a related field is preferrable.
- Experience with platform and software development, release management in a mature managed service environment.
- Knowledge of and experience with DevOps tools and techniques, such as Infrastructure as Code and Immutable Infrastructure
- Experience with CI/CD tools such as GitHub, ADO, ArgoCD and DevSecOps technologies.
- Security protocols such as OAuth, SAML or OpenID Connect
- Must have an ability to effectively communicate and work with people with a diverse range of skills and experience.