DevOps Infra Engineer
joindevops
Job Description
- Lead the development, automation, provisioning, monitoring, and support of GCP, AWS, Azure, and On-Prem infrastructure.
- Develop and maintain Kubernetes platforms such as GKE, EKS, and Anthos.
- Implement and maintain GitOps-based deployments using ArgoCD and Kustomize.
- Develop and oversee Infrastructure as Code (IaC) using Terraform and Ansible.
- Build, implement, and support private networking, VPC/VNET architectures, and hybrid connectivity.
- Configure and manage firewalls, including Palo Alto Networks firewalls and Panorama.
- Develop and maintain Infrastructure-as-Code (IaC) for cloud and on-prem deployments.
- Develop automation scripts in Python, Shell (Bash), and Batch to simplify configuration and operations.
- Build and troubleshoot IPSec VPNs, tunnelling protocols, and NAT (SNAT, DNAT, UNAT).
- Manage and support VMware environments, including ESXi, vSwitch, DVS, and NSX-T.
- Establish monitoring, alerting, and logging using Datadog, Grafana, Prometheus, Coralogix, ELK, and Splunk.
- Support production environments (L2/L3), handling incidents, root cause analysis, and problem management.
- Collaborate with application, security, and platform teams to ensure infrastructure reliability and compliance.
- Engage in a 24x7 on-call schedule to provide support for production systems when needed.
What We're Looking For
- B.Tech. / B.E. / MCA in Computer Science with 6+ years of experience.
- Must have strong analytical and creative problem-solving skills.
- Demonstrates an extremely high level of accuracy and attention to detail.
- Strong communication skills and ability to work effectively with team members.
- Experience managing GCP, AWS, and Azure cloud platforms, along with On-Prem infrastructure.
- Proficiency in Kubernetes and container platforms, such as GKE, EKS, and Anthos.
- Skilled in Infrastructure as Code (IaC) through Terraform and Ansible.
- Proficient in scripting and automation through Python, Shell/Bash, and experience with API-based integrations.
- Experience with monitoring and alerting tools such as Grafana, Prometheus, and Datadog.
- Understanding of cloud security fundamentals, IAM, network security controls, and secure baseline configurations.
- Ability to collaborate effectively with geographically distributed teams and cross-functional collaborators.
- Willingness to take part in a 24x7 on-call schedule to support production systems as needed.