Software Engineering
ripplehire
Job Description
System Reliability Availability Design and maintain faulttolerant highavailability architectures across AWS Azure and GCP Implement redundancy load balancing and automated failover strategies
Cloud Infrastructure Management Deploy manage and optimize cloud resources using IaC tools such as Terraform Ansible
Monitoring Observability Implement monitoring ing and logging frameworks using Splunk Azure monitor Dynatrace AWS cloud watch or similar to detect and resolve issues proactively
Incident Management Lead realtime incident response rootcause analysis and postmortems to continuously improve uptime and resilience
Capacity Planning Scaling Predict traffic patterns optimize resource utilization and enforce autoscaling and performance best practices
Automation Tooling Develop scripts and internal tooling for automating routine tasks to reduce manual intervention Languages may include Python Power Shell or Bash
Security Compliance Collaborate with security teams to implement secure infrastructure practices including encryption rolebased access auditing and vulnerability management
Collaboration Mentorship Work across engineering and DevOps teams providing guidance on reliability best practices and mentoring junior SREs