Production Engineering Technical Lead
cisco
Job Description
Minimum Requirement
- Design, implement, and maintain scalable, highly available, and fault-tolerant systems in cloud environments
- Optimize the performance, cost, and efficiency of cloud infrastructure by leveraging cloud-native tools and services.
- Supervise infrastructure and applications to ensure optimal performance, availability, and security.
- Troubleshoot production issues in infrastructure platforms, applications, and services, including root cause analysis and resolution.
- Implement automated monitoring and alerting to identify performance bottlenecks and downtime before it impacts users.
- Collaborate with Devx application teams to automate and streamline the deployment of applications and updates to production environments using CI/CD pipelines.
- Ensure smooth and efficient release management, including managing environment configurations and ensuring minimal downtime during production releases.
- Maintain version control and manage rollback strategies for production releases.
- Participate in on-call rotations to provide 24/7 production support for critical incidents in the cloud platform.
- Lead incident management processes, including troubleshooting, escalation, and resolution of production issues.
- Document incidents and solutions for future reference and continuous improvement.
Minimum Qualification:
- BS/MS in Computer Science
- At least 12+ years of experience includes years of experience in production engineering, site reliability, or a similar role.
- In-depth experience with Container platforms such as Google Anthos.
- Strong understanding of networking, containers (e.g., Docker, Kubernetes), microservices architectures and distributed systems. Proficient in CI/CD tools (e.g., Jenkins, ArgoCD) and version control systems (e.g., Github).
- Strong understanding of CI/CD pipelines, observability (monitoring, logging, tracing), and incident management frameworks.
- Excellent problem-solving skills, with the ability to diagnose and resolve complex production issues.
- Strong communication and leadership skills, with a track record of driving technical initiatives.