Staff DevOps Engineer
arm
Job Description
Responsibilities:
- Develop, deploy, and manage scalable, reliable, and secure infrastructures across on-premises environments and cloud platforms such as AWS, Azure, and Google Cloud Platform, including multi-cluster and multi-regional Kubernetes environments.
- Develop and maintain automation scripts (Python, Bash, Shell, etc.) and automation tools (GitLab, Hashicorp Terraform, Hashicorp Vault, etc.) to streamline & improve deployment, monitoring, and management processes, using Infrastructure as Code (IaC).
- Define and maintain infrastructure automation principles, collaborating with infrastructure teams to embrace & cultivate continuous integration and continuous delivery/deployment (CI/CD).
- Implement and integrate with monitoring and observability solutions, such as AIOps, to proactively detect and respond to system issues.
- Analyze system performance and implement improvements to enhance cost efficiency and user experience.
- Participate in on-call rotations to ensure 24/7 system availability.
- Maintain detailed documentation (HLDs and LLDs) of infrastructure, processes, and procedures to facilitate learning and operational continuity.
- As a Staff Engineer you will act as a Technical Lead on quarterly prioritized features, supporting the project managers and coordinate with IT teams, scrum masters, and the wider business to deliver projects.
- Adopt a continuous learning mentality to stay updated with industry trends and new technologies to improve operational performance.
Required Skills and Experience:
- Extensive knowledge of cloud platforms (AWS, Azure, or GCP), containerization technologies (Docker, Kubernetes, Rancher, and Cloudbees, etc.), automation tools (Terraform, Ansible), and monitoring solutions (Prometheus, Grafana).
- Strong scripting and programming skills (Bash, Python, and Go).
- Experience in deploying, maintaining, and integrating Hashicorp Vault, GitLab, Jenkins, Ansible and Terraform Enterprise platforms with automation pipelines.
- Excellent analytical and problem-solving abilities with a proactive approach to identifying and resolving issues.
- Experience in a DevOps or SRE or Platform Engineering role, with a confirmed focus on hybrid-infrastructure.
- Good communication and collaboration skills, with the ability to work efficiently in a team-oriented environment.
- Experience working in Agile delivery environment integrated with Atlassian Jira and Confluence applications.