Staff Software Engineer – DevOps (Sports Team)
wbd
Job Description
Roles & Responsibilities
As a Staff Engineer – DevOps, you operate at the intersection of DevOps, Platform Engineering, and Site Reliability Engineering. You set technical direction for AWS infrastructure, infrastructure automation, reliability practices, and developer enablement. Your work creates leverage across multiple teams by establishing standards, building shared platforms, and improving system resilience, operability, and developer productivity at scale. This is a senior individual contributor role requiring strong architectural judgment, deep hands-on expertise, and the ability to influence without authority
- Platform, Infrastructure & Architecture
- Own and evolve the technical direction for AWS infrastructure platforms used across multiple services and teams
- Design and standardize Infrastructure as Code (IaC) patterns (Terraform / CloudFormation / CDK) that serve as long-lived foundations for the organization • Define best practices and guardrails for AWS account structure, networking, IAM, security controls, and cost governance
- Design infrastructure and deployment systems that prioritize high availability, fault tolerance, scalability, and disaster recovery Reliability, Operations & SRE Practices
- Drive adoption of SRE principles including error budgets, capacity planning, load testing, and fault-injection to ensure system resilience
- Lead efforts to improve operational excellence, reducing toil through automation, self-service tooling, and preventative controls
- Troubleshoot and resolve complex, high-severity production issues using logs, metrics, traces, and code-level analysis
- Ensure systems remain reliable during peak traffic through proactive capacity and performance planning CI/CD, Automation & Developer Experience
- Architect and evolve CI/CD pipelines and infrastructure workflows to enable safe, fast, and repeatable delivery at scale
- Partner with application teams to streamline build, deployment, and operational workflows, improving developer productivity and autonomy
- Champion automation across the full infrastructure lifecycle, minimizing manual intervention and operational risk Technical Leadership & Influence
- Act as a trusted technical authority and reviewer for critical infrastructure and platform designs
- Drive engineering standards and best practices that are adopted beyond your immediate team
- Mentor senior and mid-level engineers, raising the bar for infrastructure, reliability, and operational engineering
- Communicate architectural decisions and trade-offs clearly to engineers and leadership, influencing outcomes through expertise, data, and judgment
What to Bring
- 9- 13years of experience in DevOps, Platform Engineering, SRE, or Cloud Infrastructure, with ownership of production systems at scale
- Deep, hands-on expertise with AWS, including multi-account environments, networking, security, and reliability patterns
- Advanced experience with Infrastructure as Code (Terraform preferred; CloudFormation/CDK acceptable) and setting patterns used by others
- Strong Linux and networking fundamentals; ability to debug complex distributed systems
- Experience with containerized platforms (Kubernetes, EKS, ECS) and microservice architectures • Proficiency in at least one programming or scripting language (Python, Go, Shell, etc.)
- Experience operating and improving observability systems (metrics, logging, tracing) and using data to drive operational decisions
- Proven ability to influence architectural direction and raise technical standards across teams, not just within one squad
- AWS certifications or exposure to other cloud platforms are a plus, but not required
What We Offer:
-
A Great Place to work.
-
Equal opportunity employer
-
Fast track growth opportunities
-