Staff Site Reliability Engineer

veeam

Pune 8 Years Exp Posted 253d ago

Job Description

Reliability Engineering & Resilience:

  • Act as a technical authority in your area, mentoring senior engineers and guiding design choices that improve service reliability and resilience
  • Lead the definition and enforcement of SLIs, SLOs, and error budgets; drive adherence across engineering teams
  • Collaborate with Staff peers across teams to align strategy and champion shared reliability standards and goals
  • Partner with development and product teams to proactively design for failure, build resilient architecture, and operationalize reliability from the start

Observability & Operational Excellence:

  • Drive company-wide adoption of observability best practices and tooling
  • Ensure metrics, logs, and traces provide deep, actionable insights across systems
  • Lead complex incident responses, postmortems, and systemic reliability improvements
  • Promote and enforce a blameless culture of learning and continuous improvement

Engineering at Scale:

  • Lead initiatives in infrastructure as code, deployment automation, and resilience testing
  • Influence the development and adoption of chaos engineering practices and release validation frameworks
  • Partner with platform and security teams to ensure production readiness

Collaboration & Culture:

  • Work closely with your peer Staff Engineers to plan, align, and deliver against reliability goals
  • Provide architectural guidance and advocate for engineering rigor and consistency
  • Represent the SRE team in technical leadership forums and product planning discussions

What we expect from you:

  • 8+ years of experience in a Software Engineering or SRE role, including technical leadership
  • Demonstrated experience mentoring and guiding senior engineers
  • Deep expertise in building distributed systems on public cloud (Azure preferred)
  • Strong skills in programming (e.g., JS, Go, Typescript, Java, or C#)
  • Hands-on experience with observability tooling (e.g., Prometheus, Grafana, OpenTelemetry)
  • Mastery of infrastructure automation tools (Terraform, Pulumi) and container orchestration (Kubernetes)
  • Ability to communicate clearly across geographies and disciplines

Will be an advantage:

  • Experience leading SRE initiatives across multiple product teams
  • Background in chaos engineering, incident learning, or performance and load testing
  • Familiarity with global compliance standards (ISO, SOC 2, GDPR, FedRAMP, CMMC)

We offer:

  • Family Medical Insurance
  • Annual flexible spending allowance for health and well-being
  • Life insurance
  • Personal accident insurance
  • Employee Assistance Program
  • A comprehensive leave package, including parental leave
  • Meal Benefit Pass
  • Transportation Allowance
  • Daycare/Child care Allowance
  • Veeam Care Days – additional 24 hours for your volunteering activities
  • Professional training and education, including courses and workshops, internal meetups, and unlimited access to our online learning platforms (Percipio, Athena, O’Reilly) and mentoring through our MentorLab program

Similar Openings for You