Site Reliability Engineer Lead

bankofamerica

Hyderabad 8 Years Exp Posted 619d ago

Job Description

Responsibilities for job opportunity

  • As part of the SRE team, perform full stack triaging of alerts and engage other engineers to identify root cause of application performance & stability issues.
  • Work with stakeholders such as product owners to define service level objectives (SLOs) for application features and services.
  • Track performance against SLOs in partnership with development teams or other stakeholders, and ensure systems continue to meet SLOs over time.
  • Design, develop dashboards and reports to communicate key metrics.
  • Identify opportunities to improve alerting posture and create/update alerts accordingly.
  • Work closely with the Engineering team to understand application architecture and perform Single point of failure analysis and create scenarios for testing resiliency of the application.
  • Create/derive NFR/Workload model and ensure performance & resiliency is considered early in the SDLC.
  • Execute performance/chaos tests, analyze using APM and other tools to identify performance & stability issues.
  • Document any findings/analysis/results, communicate and present to stakeholders.
  • Perform analytics on previous incidents to understand root causes and use automation to reduce the probability and/or impact of problem recurrence.
  • Demonstrate proficiency with DevOps tools, JIRA, ServiceNow, MS Project and perform tasks using the tools.

Requirements

Education: B.E. / B. Tech / M.E. / M. Tech / MCA / Msc (IT/Computer Science)

Certifications If Any: NA

Experience Range: 8 to 10 years of information technology experience with 5+ years working on DevOps or SRE team or performance engineering team.

Foundational Skills

  • 8+ years of information technology experience with 5+ years working on DevOps or SRE team or performance engineering team
  • Experienced in triaging of production issues using APM tools such as Dynatrace or AppDynamics or New Relic and log aggregation tools such as Splunk, ELK, etc.
  • Strong experience in Java and Front-end development (UI and UX) (React JS, Angular)
  • Experience with Apache/tomcat Middleware and Java/RESTful services framework (mulesoft is a plus)
  • Strong Python, UNIX, Wintel, Perl/Shell scripting
  • Strong experience working with CI/CD tools - bitbucket, JFrog Artifactory, Jenkins, Artifactory, Terraform/Packer, Ansible
  • Knowledge on Cloud, Container and Kubernetes technologies
  • Experience with SRE concepts like SLI/SLOs & error budgets and working with developers to track and improve them on a continuous basis.
  • Must be able to provide oral and written discussion of analytical findings using narrative and graphic forms.
  • Must be able to use qualitative and quantitative analytical skills to assess the effectiveness of the operations.
  • Identifying symptoms for process improvement.
  • Analytical and investigation, and organization skills
  • Communications including being able to craft content for executive level presentations.

Desired Skills

  • Great soft skills – People and communications skills are essential.
  • Good proficiency in system, network, security and database operations, protocols, and industry standard technologies.
  • Experience with tools such as Tanium, Artifactory, BMC TrueSight Orchestration
  • Experience in command line interfaces (CLI), third party APIs and integration.
  • Experience in server administration with Red Hat Enterprise Linux and Windows Server
  • Good understanding of developing fault tolerant solutions and knowledge in horizontal scaling and resiliency/HA.
  • Ability to juggle competing priorities and adapt to changes in project scope.
  • College Degree or Higher or equivalent work experience

Similar Openings for You