Sr. Site Reliability Engineer
blackline
Job Description
Make Your Mark:
- Drive the performance and operational aspects of a suite of applications and services from a performance, responsiveness, capacity, and availability perspective.
- Responsible for the day-to-day facilitation of our DevOps & SRE practices and work as part of the team.
- Focus on delivering multi-release and potentially multi-year technical projects, including strategy and implementation. Additionally, through the course of that primary focus, demonstrate cross-product line business context and understanding.
- Exhibit a very high standard of technical judgment, innovation, and execution to tackle open-ended problems that require difficult prioritization and trade-offs, often defining what, how, and when with precision.
You'll Get To:
-
- Develop new methods, processes, and tools which are useful to colleagues and others.
- Demonstrate a deeper and more comprehensive understanding of cloud technologies and services. Approach problems with a logical and systematic mindset.
- Clearly communicate technical information and project updates to non-technical team members and management. Maintain up-to-date documentation for configurations, procedures, and troubleshooting guides.
- Effectively manage time and prioritize tasks to meet project deadlines and SLAs. Balance reactive tasks (incident response) with proactive initiatives (optimization).
- Engage effectively with peers to deliver projects and visions without being the primary contributor of technical execution.
- Take responsibility for assigned tasks, services, applications, and BlackLine as a whole. Deliver more than the bare minimum requirements. Consider the needs and goals of internal or external customers and use this experience to inform on delivery.
- Engage in strategic planning and provide input.
- Maintain documentation and operational knowledge base.
What You'll Bring:
- Required
- Hands-on problem-solving skills and root cause analysis, technical leadership, and mentoring qualities.
- Strong written and oral communication skills.
- Manage end-to-end availability and performance of mission-critical services and build automation to prevent problem recurrence.
- Lead by example, care for your team, and establish credibility with the quality of the team's technical execution.
- Participate in and manage on-call rotation for the SRE team.
- Design, write, and deliver software to improve the availability, scalability, latency, and efficiency of BlackLine's services.
- Cross-system and full-stack architecture experience and awareness.
- Ability to communicate well with both business owners, executives, and technical staff, at the appropriate levels.
- Experience with software development processes and methodologies.
- Experience with cloud computing technologies, such as AWS, Azure, and Google Cloud Platform.
- Experience with cloud-based development and deployment.
- Experience with cloud-based monitoring and troubleshooting.
- Learn/adapt to new technologies.
- Preferred skills and Qualifications
- Relevant Bachelors Degree - B.Tech/B.E/M.Tech in computer science or related field.
- 6yrs -9yrs overall experience in SRE.
- Cloud Knowledge: Preferred - GCP.
- OS Skills: Linux, Windows.
- IaC: Terraform, Chef/Ansible.
- Containerization: Kubernetes, Docker, Nomad.
- CI/CD: Jenkins, GitHub Actions, Azure DevOps, Harness (preferred - Jenkins, GitHub Actions).
- Scripting: Python, PowerShell, Shell.
- Version Control: Git.
- Cloud Certifications: Preferred certifications like HashiCorp Terraform Associate+, Kubernetes and Cloud N