Sr. Site Reliability Engineer
Pune
Job Description
What You'll Do:
- Responsible for building and provisioning enterprise-grade data, messaging, and analytics platforms in the public cloud
- Ensure that data, services, and infrastructures are reliable, fault-tolerant, efficiently scalable, and cost-effective
- Administration of Linux machines, web servers, application servers, databases, and infrastructure support for products and businesses
- Own end-to-end availability and performance of mission-critical services and build automation to prevent problem recurrence
- Develop tools and automation using Ruby, python, etc., to increase availability and performance
- Collaborate with Product and Release Engineering for new product releases and maintenance
- Coordinate change management
- Participate in incident response and blameless post mortems
- Participate in 24×7 on-call rotation for after-hours emergencies
What You Will Bring to Coupa:
- Bachelor’s degree and 7+ years of professional experience
- 3+ years of production support for Elasticsearch/Redis/Kafka (Elasticsearch experience is a must)
- 3+ years of production system administration and web operations experience
- 2+ years of programming experience in Ruby, Java, Perl, Python, or equivalent
- 2+ years of experience with configuration management tools such as Chef, Puppet, Salt, or equivalent
- Experience with AWS or a comparable cloud provider
- Experience with Infrastructure-as-Code products like Terraform
- Experience in massive-scale web operations
- Expertise in problem-solving and analyzing globally distributed systems
- Excellent written and verbal communication skills