DevOps Engineering Manager
instahyre
Job Description
Team Leadership and Management:
- Lead, mentor, and grow a high-performing DevOps engineering team.
- Establish best practices for hiring, onboarding, and career development within the DevOps function.
- Foster a culture of ownership, collaboration, and continuous improvement.
Infrastructure and Automation:
- Drive the design and evolution of our cloud infrastructure on GCP.
- Architect and implement scalable, highly available, and secure infrastructure-as-code solutions using Terraform, Ansible, Pulumi, or equivalent tools.
- Ensure infrastructure is cost-efficient and optimised for performance.
CI/CD and Release Engineering:
- Own and improve CI/CD pipelines to ensure reliable and fast software delivery.
- Implement best practices for release automation, rollback strategies, and version control.
- Drive adoption of containerisation and orchestration tools like Docker and Kubernetes.
Reliability and Monitoring:
- Define SLAS, SLOS, and incident management processes to ensure system reliability and performance.
- Lead efforts around observability - implement tools for monitoring, alerting, and logging (e. g., Prometheus, Grafana, ELK, Datadog).
- Conduct regular post-mortems and implement learnings to improve incident response.
Collaboration and Strategy:
- Partner with engineering and product leaders to align DevOps initiatives with business goals.
- Champion a DevSecOps mindset by integrating security best practices into the development lifecycle.
- Evaluate and adopt tools, processes, and technologies that drive team efficiency and product excellence.
Requirements:
- 12+ years of experience in DevOps, site reliability engineering, or infrastructure roles, with 3+ years in a leadership/managerial capacity.
- Proven track record of designing, building, and scaling cloud-native infrastructure in production environments.
- Strong expertise in CI/CD tools (GitHub Actions, Jenkins, GitLab CI, etc. ), containerisation (Docker), and orchestration (Kubernetes).
- Hands-on experience with infrastructure-as-code and cloud automation (Terraform, Ansible, CloudFormation, Pulumi, etc. ).
- Deep understanding of system reliability, performance tuning, and security in distributed systems.
- Excellent communication skills with the ability to translate technical concepts into business impact.
- A collaborative mindset and a passion for mentoring and empowering teams.
- A bachelor's or master's degree in computer science, engineering, or a related field.