Staff DevOps Engineer

arm

Bengaluru, India NM Years Exp Posted 1h ago

Job Description

You will build automation and tooling that supports large-scale storage operations. This includes creating repeatable infrastructure patterns, improving operational workflows, and helping reduce manual intervention across the storage estate.

You will use technologies such as Terraform, Ansible, Python, Git, and monitoring platforms to improve how services are deployed, managed, observed, and supported.

You will support Linux-based infrastructure used by engineering and HPC workloads, including troubleshooting, configuration, patching, filesystem analysis, networking, permissions, package management, and log review.

You will help develop self-service operational workflows through an engineering portal, such as Backstage, making common storage and infrastructure tasks easier for engineering teams to request, track, and consume.

You will work closely with storage, infrastructure, security, and engineering teams to improve reliability, support incident response, investigate root causes, and maintain secure and well-managed systems.

You will also help improve monitoring, alerting, dashboards, documentation, runbooks, and the flow of technical insights. Where useful, you will explore Agentic AI and AI-assisted tooling for anomaly detection, log analysis, triage, reporting, and operational decision support.

Required Skills and Experience:

  • Experience with DevOps, infrastructure engineering, SRE, platform engineering, or similar operational practices.
  • Linux systems administration knowledge, including solving, filesystems, networking, permissions, processes, patching, package management, and log analysis.
  • Experience with Infrastructure as Code or configuration management tools such as Terraform and Ansible.
  • Ability to develop automation or tooling using Python or a similar language.
  • Experience supporting reliable systems in a production or operational environment.

“Nice To Have” Skills and Experience:

  • Exposure to storage platforms, including file, object, or cloud-integrated storage.
  • Experience with engineering, EDA, HPC, or large-scale technical computing environments.
  • Familiarity with NFS, SMB, object storage, snapshots, replication, backup, or disaster recovery.
  • Familiarity with AWS, GCP, or Azure.
  • Exposure to CI/CD, Git-based workflows, testing pipelines, or release automation.
  • Experience using AI/ML tooling, agent-based workflows, or automation assistants in operations.
  • Understanding of identity, access control, secrets management, and security practices.
    • Experience with platforms such as LakeFS.

Similar Openings for You