Site Reliability Engineer

equifax

Trivandrum 5 Years Exp Posted 521d ago

What you'll do:

Kubernetes: Design, deploy, and manage production-ready Kubernetes clusters.
Cloud Infrastructure: Build and maintain scalable infrastructure on GCP using tools like Terraform.
Performance: Identify and resolve performance bottlenecks in applications and infrastructure.
Observability: Implement monitoring and logging to proactively detect and resolve issues.
Incident Response: Participate in on-call rotations, troubleshooting and resolving production incidents.
Collaboration: Promote reliability best practices and ensure smooth deployments.
Automation: Build CI/CD pipelines, automated tooling, and runbooks.
Problem Solving: Triage complex issues, lead blameless postmortems, and drive remediation.
Mentorship: Guide and mentor other SREs.

BS in Computer Science or related field.
2+ years of experience developing and/or administering software in public cloud
5+ years of programming experience (Python, Bash/Shell Script, Java, Go, etc.).
3+ years of experience monitoring infrastructure and application performance.
5+ years experience of system administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible and/or containers (Docker, Kubernetes, etc.)
5+ years experience working with continuous integration and continuous delivery tooling and practices