Designation-InfraAI Engineer
ey
Job Description
Job Description
- Design and implement AI-integrated infrastructure solutions across cloud platforms (Azure, AWS, GCP), enabling predictive automation, intelligent scaling, and self-healing capabilities.
- Develop and deploy Python-based automation scripts and ML models for infrastructure use cases such as anomaly detection, capacity forecasting, and auto-remediation.
- Integrate AI/ML tools like Kubeflow, MLflow, Azure Machine Learning, Amazon SageMaker, and TensorFlow Serving into infrastructure pipelines and operational workflows.
- Build and manage Infrastructure as Code (IaC) using Terraform, Pulumi, or Ansible, ensuring modular, reusable, and version-controlled infrastructure components.
- Implement AI-enhanced observability by combining telemetry data with ML models to detect patterns, reduce alert noise, and predict failures using tools like Prometheus, Grafana, ELK Stack, and OpenTelemetry.
- Lead or contribute to cloud migration projects, applying AI-driven insights for workload placement, performance tuning, and post-migration optimization.
- Design and maintain CI/CD pipelines for infrastructure provisioning and AI model deployment using Azure DevOps, GitHub Actions, or GitLab CI/CD.
- Collaborate with data science and platform teams to translate model requirements into scalable, secure, and cost-efficient infrastructure.
- Ensure infrastructure governance, compliance, and security by embedding DevSecOps principles and automating policy enforcement.
- Continuously evaluate emerging AI technologies and infrastructure tools to improve automation, reliability, and operational efficiency.
Desired Profile
- 6+ years of experience in infrastructure engineering, cloud automation, or AI-integrated operations.
- Strong hands-on experience with Azure, AWS, or GCP, including compute, networking, and ML services.
- Proficient in Python for scripting, automation, and AI model integration.
- Skilled in Terraform, Pulumi, or Ansible for IaC.
- Experience with AI/ML platforms like Kubeflow, MLflow, Azure ML, or SageMaker.
- Familiarity with containerization (Docker) and orchestration (Kubernetes) for scalable AI workloads.
- Knowledge of monitoring tools and AI-enhanced observability.
- Strong understanding of cloud security, DevSecOps, and compliance frameworks.
- Excellent problem-solving, documentation, and cross-functional collaboration skills.
Experience
- 6 years and above
Education
- B.Tech. / BS in Computer Science
Technical Skills & Certifications
- Azure AI Engineer Associate, AWS Certified Machine Learning – Specialty, or Google Professional ML Engineer
- HashiCorp Certified Terraform Associate
- Certified Kubernetes Administrator (CKA)