DevOps, MLOps & GenAI Platform Owner
hpe
Job Description
1. Platform Ownership (Ops + Governance)
- Own production systems: incidents, SLAs, stability, uptime
- Drive RCA and eliminate recurring issues (sync, bots, latency)
- Improve observability, monitoring, and reliability
2. Technical Leadership (Hands-on)
- Build and optimize CI/CD, MLOps, and GenAI pipelines
- Deploy and tune ML/GenAI models (RAG, prompt optimization)
- Manage data pipelines (SharePoint/API ingestion, sync)
- Ensure scalable infra (Docker, Kubernetes, cloud)
3. Automation & Bot Ecosystem
- Own/Govern RPA platforms (WorkFusion/UiPath)
- Fix bot failures, duplicates, scheduling issues
- Improve resilience and scalability
4. AI Quality & Governance
- Improve retrieval accuracy, reduce hallucinations
- Optimize latency, cost, and response quality
- Define and drive basic governance and validation checks.
5. Stakeholder Management
- Interface with business, analytics, and engineering teams
- Translate requirements → technical solutions
- Provide status, risks, and KPI reporting
6. Team Leadership (IC+)
- Mentor junior engineers and support teams
- Drive best practices across DevOps/MLOps/AI
- Lead execution for small pods/projects
What you need to bring:
Core Tech
- DevOps/MLOps: Jenkins, GitLab, Azure DevOps
- Cloud + Containers: Azure, Docker, Kubernetes
- Programming: Python, Java, Selenium
- RPA: WorkFusion / UiPath / Automation Anywhere
AI/ML
- GenAI (RAG, prompt engineering, evaluation)
- ML deployment & monitoring
Systems
- Production support at scale, incident mgmt, change mgmt and ITIL governance
- Data pipelines & enterprise integrations (SAP, ServiceNow, SharePoint, etc)
Monitoring
- Prometheus, Grafana, ELK, Spark, Airflow
Success Metrics
- Reduced incident volume & MTTR
- Improved AI accuracy & trust
- Stable bot/automation ecosystem
- Faster delivery of enhancements
- Effective stakeholder mgmt without escalations