Senior Site Reliability Engineer
rubrik
Job Description
What You’ll Do:
- Deploy and operate security solutions and supporting infrastructure in cloud and datacenter environments in support of internal customer security needs and FedRAMP requirements
- Develop and automate Security tasks that span from Security Operations to Infrastructure as Code in support of InfoSec initiatives
- Manage the availability, capacity and configuration of InfoSec’s mission critical applications and services
- Define, measure and monitor SLAs & SLOs for systems and services with the objective of achieving and exceeding availability and reliability goals
- Manage and streamline monitoring systems to enhance observability and enable proactive identification of issues.
- Coordinate and manage incidents, upgrades and changes for InfoSec’s applications and services
- Drive post-incident analysis with partner teams and/or vendors to identify root cause and ensure preventative measures are implemented promptly
- Assist in Security Incident investigations
- Manage a scalable and highly available solution for security logging and drive efforts of logging onboarding for increased security visibility
- Perform Production Readiness Assessments of new services to identify reliability needs and surface potential gaps
- Develop and maintain documentation and runbooks to reduce MTTR and inform future automation development
- Work cross functionally across global time-zones requiring flexible work hours
- Participate in 24/7 on-call rotations
Experience You’ll Need:
- Bachelor degree in Computer Science or related field or equivalent experience
- 8+ years experience in site reliability engineering, deploying, managing and troubleshooting security systems across the stack (on-prem and cloud)
- Strong operational mindset focused on availability, reliability, performance and continuous improvement of systems and services
- Operational knowledge of Linux and Windows systems
- Experience with Terraform, Ansible, Vault, Prometheus, Grafana and Github
- Proficiency in any scripting language (Python, PowerShell, Perl, Ruby, shell, etc.)
- Working experience in GCP, AWS or Azure
- Experience collaborating with internal customers to establish strong requirements, prioritize work based on outcomes that drive operational effectiveness