Specialist DevOps & Site Reliability Engineer

equilend

Bengaluru 6 Years Exp Posted 421d ago

Role Responsibilities

Design, build, and manage CI/CD pipelines that streamline software delivery and reduce lead time to production.
Develop and support scalable, containerized solutions using Docker and Kubernetes.
Implement and manage Infrastructure-as-Code (IaC) using tools such as Terraform and Ansible to ensure consistent environments.
Lead incident response and post-mortem processes, championing best practices in availability, latency, and system resilience.
Define and maintain service-level indicators (SLIs), objectives (SLOs), and agreements (SLAs) for key systems.
Collaborate with application teams to define monitoring strategies and observability standards using Grafana, Prometheus, or similar tooling.
Partner with global DevOps and infrastructure teams to drive automation, performance improvements, and cost optimization in cloud environments (AWS preferred).
Provide technical mentorship to team members and act as a subject matter expert in Site Reliability Engineering practices.
Contribute to the development of operational playbooks and automated runbooks for common failure scenarios.
Continuously evaluate and adopt emerging tools and technologies that support the goals of high availability and rapid delivery.

Required Skills

A minimum of 6+ years of relevant experience in DevOps, SRE, or software infrastructure roles.
Strong practical knowledge of CI/CD tooling (e.g., Jenkins, GitLab CI/CD, GitHub Actions) in distributed environments.
Proven expertise with containerization and orchestration, especially Docker and Kubernetes.
Hands-on experience with Infrastructure-as-Code tools like Terraform, Ansible, or Pulumi.
Proficiency in scripting languages such as Python, Bash, or similar for automation and tooling.
Experience implementing SRE principles, including reliability metrics, SLIs/SLOs, and chaos engineering practices.
Strong familiarity with cloud infrastructure, ideally AWS; Azure or GCP experience also valuable.
Demonstrated experience with monitoring and alerting frameworks, e.g., Prometheus, Grafana, ELK, or Splunk.
AWS certification (Solutions Architect or DevOps Engineer) is a plus.
Excellent collaboration and communication skills, with experience operating across globally distributed teams.