Senior Site Reliability Engineer
medallia
Job Description
Responsibilities
- Educate application and infrastructure management about SRE approaches
- Collaborate with product-engineering teams, build strong relationships and be ready to solve complex challenges together.
- Ensure applications and their infrastructure are updated and released at a defined pace.
- Build monitoring, automation and tooling around applications and related standard procedures, eliminate manual work.
- Troubleshoot complex problems that may span the full service stack.
- Ensure SLAs, proactively monitor and manage the availability of infrastructure and applications.
- Optimize performance of components across the full service.
- Be a part of the SRE team on-call rotation.
Qualifications
Minimum Qualifications
- 3+ years of experience with Site Reliability Engineering and/or related software development roles
- Experience with:
- Building, configuring, and maintaining operational monitoring and reporting tools
- Operations in on-premises and cloud environments
- Incident management and change management
- Complex information security concepts
- Demonstrated knowledge of:Linux OS and fundamental technologies like networking, DNS, Mail, IP filtering, etc.
- Scripting languages (Python, Bash, Groovy, Go, etc)
- Traditional web stack (frontend, API, application backend, caches, databases)
- Asynchronous and reliable application design (message queues, DB replicas, load balancing, auto-scaling, etc)
- Kubernetes deployments
- Release approaches (roll-out, canary, blue/green, etc)
- Ability to be part of the team’s on-call rotation
Preferred Qualifications
- Strong communication skills
- Experience with:
- Infrastructure as Code tools (Ansible, Terraform, CloudFormation, etc)
- Relational DB’s such as: PostgreSQL
- NoSQL DB such as: Redis, MongoDB, Cassandra, BigQuery
- Messaging/Stream processing platform such as: Kafka
- AWS (EC2, S3, RDS, etc…)
- Virtual/cloud or physical networking and relevant firewall/security configuration
- Microsoft Windows Server, IIS, Active Directory, SQLServer, and PowerShell
- CI/CD tools such as: Jenkins, ArgoCD
- Jenkins pipelines
- Background working in heavily regulated industries such as banking, finance, or healthcare