Senior Software Engineer - Site Reliability Engineering
pantheon
Job Description
If all of this sounds interesting to you, read on!
- Working on advanced globally scaled implementations of WordPress and Drupal CMS systems using the latest in Google Cloud platform offerings.
- Working on a large scale orchestration platform serving millions of containers, using lower level Linux systems like systemd/cgroups directly.
- Administering , developing and maintaining standardization and configuration state management with Kubernetes, Chef, Terraform, GCP Tooling , Vault etc.
- Close collaboration with the wider engineering team to both deliver platform improvements and provide subject-matter-expertise for other technical initiatives
- Owning your team’s production systems, measure and track their health with SLO’s, and assist our dedicated support team to resolve production issues
- Continuous improvements to our standard of engineering excellence by implementing best practices for coding, testing, deploying and communication
- Supporting Pantheon as a member of the on-call engineer rotation, contributing to the infrastructure’s stability, reliability, and performance that drives Pantheon's success.
- Supporting and meeting with Pantheon customers, as needed, to ensure their success as well as ours.
What you need to Succeed
- Strong understanding and work experience developing with either Python, GoLang or any object oriented programming language.
- Strong understanding and working knowledge of Kubernetes, Terraform, CI/CD pipelines , Release Engineering practices .
- Strong understanding of Linux operating systems administration.
- Work-related experience with large-scale, high-traffic platforms.
- Work-related experience with designing scalable and robust services in the real world.
- Clear communication skills and the ability to represent your contributions and ideas with clarity while remaining open and giving space to the contributions and ideas of others.
- Participate in system design consulting, platform management, and capacity planning.
- Developing and maturing sustainable systems and services through automation and uplifts.
- Balance feature development speed and reliability with well-defined service-level objectives.
- Extensive experience supporting livesite and on call.
- Experience building and operating complex observability tooling like Grafana Cloud , Prometheus etc.
Bonus Points
- Working knowledge of Cassandra, MySQL, Redis
- Working knowledge of React, Node.js, Python, Go,
- Working knowledge of Docker, Chef, CircleCI, Vault.
- Working knowledge of Wordpress, Drupal.
- Coding experience beyond simple scripts.
- CKA, CKAD, CKS or CNCF Certifications
- Experience supporting and developing Open Source tooling on public clouds like GCP, AWS or Azure.