Sr. TL SRE

hirebridge

Hyderabad 8 Years Exp Posted 67d ago

Roles and Responsibilities

· Own production services end to end. Accountable for reliability, availability, scalability, performance, and operational health.

· Define and manage SLIs and SLOs, using error budgets to guide delivery decisions.

· Influence of service and system design to improve fault tolerance, observability and operational sustainability.

· Debug complex production issues across application code, services and infrastructure using software engineering practices.

· Perform root cause analysis using logs, metrics, traces, and code-level investigation.

· Build automation and self-healing mechanisms to prevent repeat failures.

· Execute production changes (patching, certificate management, software releases) with safety, automation, and observability.

· Design and operate production observability aligned to service health and customer impact.

· Lead and participate in incident response for high-severity events.

· Collaborate with engineering, product, architecture, and operations teams.

· Operate with autonomy and sound judgment in reliability decisions.

Skills & Requirements

Qualifications:

· 8 t0 12 years of hands-on Site Reliability Engineering or reliability-focused engineering experience with end-to-end service ownership.

· Proven operation at a senior engineering scope with accountability for reliability outcomes.

· Strong software engineering skills in C#, .NET, Java, Python, React, or similar technologies.

· Practical experience applying SRE principles (SLIs, SLOs, error budgets).

· Hands-on experience with AWS, Kubernetes, CI/CD, infrastructure as code and hybrid environments.

· Strong knowledge of Linux and Windows systems, application platforms and relational databases.

· Bachelor’s or master’s degree in computer science or equivalent experience.

· Participation in an on-call rotation; flexible hours as required.