Platform Engineer (DevOps & AI/ML/Gen-AI)
Epam
Job Description
Responsibilities
-
Design, build, and maintain cloud automation workflows using Infrastructure-as-Code tools such as Terraform or CloudFormation
-
Develop scalable frameworks for managing infrastructure provisioning, deployment, and configuration across multiple cloud platforms
-
Create and integrate service catalog components with automation platforms like Backstage
-
Leverage generative AI models to enhance service catalog capabilities, including automated code generation and validation
-
Architect and implement CI/CD pipelines for automated build, test, and deployment processes
-
Build and maintain deployment automation scripts using technologies such as Python or Bash
-
Design and implement generative AI models (e.g., RAG, agent-based workflows) for AIOps use cases like anomaly detection and root cause analysis
-
Utilize AI/ML tools such as LangChain, Bedrock, Vertex AI, or Azure AI for building advanced generative AI solutions
-
Develop vector databases and document sources using services like Amazon Kendra, OpenSearch, or custom solutions
-
Engineer data pipelines for streaming real-time operational insights to support AI-driven automation
-
Create MLOps pipelines to deploy and monitor generative AI models, ensuring optimal performance and avoiding model decay
-
Evaluate and select appropriate LLM models for specific AIOps use cases, integrating them efficiently into workflows
-
Collaborate with cross-functional teams to design and improve automation and AI-driven processes
-
Continuously research emerging tools and technologies to improve operational efficiency and scalability
Requirements
-
Bachelor's or Master's degree in Computer Science, Engineering, or related field
-
3-8 years of experience in cloud infrastructure automation, DevOps, and scripting
-
Proficiency with Infrastructure-as-Code tools such as Terraform or CloudFormation
-
Expertise in Python and generative AI frameworks like RAG and agent-based workflows
-
Knowledge of cloud-based AI services, including Bedrock, Vertex AI, or Azure AI
-
Familiarity with vector databases like Amazon Kendra, OpenSearch, or custom database solutions
-
Competency in data engineering tasks such as feature engineering, labeling, and real-time data streaming
-
Proven experience in creating and maintaining MLOps pipelines for AI/ML models in production environments
-