Lead Application Support & Reliability Engineer

ubs

Hyderabad NM Years Exp Posted 557d ago

Job Description

 apply a broad range of engineering practices with a focus on reliability, from instrumentation, performance analysis, and log analytics to identify Hot spots, automated deployment, and operations risk reduction
 minimize the risk and impact of failures by engineering operational improvements, such as predictive monitoring, auto scaling or self-healing
 collect and analyze operational data, define and monitor key metrics to identify and communicate areas for improvement
 ensure that Ops professionals and product managers are reviewing incidents and documenting the findings to enable informed decision-making. Based on post-incident reviews, will need to optimize overall process of delivery, monitoring and controls to boost service reliability
 ensure the quality, security, reliability, and compliance of our solutions by applying our digital principles and implementing both functional and non-functional requirements
 learn new technologies and practices, reuse strategic platforms and standards, evaluate options, and make decisions with long-term sustainability in mind
 work in an agile way as part of multi-disciplinary teams, participate in agile ceremonies, and collaborate with engineers and product managers
 understand, represent, and advocate for client needs
 share knowledge and expertise with colleagues, help with hiring, and contribute regularly to our engineering culture and internal communities
 drive automation to eliminate TOIL, increase self-serviceable option on Requests and Changes requested
 implement Monitoring Maturity with SLI, driving clear SLO and SLA
 create baselines for Error Budget and efforts spent on addressing TOIL
 lay down a plan to mitigate the issue in shortest possible time to avoid impact on Error Budget
 adopt test methodologies like Chaos Engineering to enhance resiliency and ability to withstand against unexpected failures
 carry out technical analysis, design, code, tests, documentation, and other engineering artifacts
 drive continuous improvement through Problem Management

Similar Openings for You