Sr Staff AI Platform Engineer - Golang and Kubernetes
synopsys
Job Description
What You’ll Be Doing:
- Building an AI Platform for Synopsys to orchestrate enterprise-wide data pipelines, ML training, and inferencing servers.
- Developing an "AI App Store" ecosystem to enable R&D teams to host Gen AI applications in the cloud.
- Creating capabilities to ship cloud-native (containerized) AI applications/AI systems to on-premises customers.
- Orchestrating GPU scheduling from within the Kubernetes ecosystem (e.g., Nvidia GPU Operator, MIG, etc.).
- Designing reliable and cost-effective hybrid cloud architecture using cutting-edge technologies (e.g., Kubernetes Cluster Federation, Azure Arc, etc.).
- Collaborating with cross-functional teams to experiment, train models, and build Gen AI & ML products.
The Impact You Will Have:
- Driving the development of advanced AI platforms that empower Synopsys' R&D teams.
- Enabling the creation of innovative Gen AI applications that push the boundaries of technology.
- Ensuring the efficient orchestration of data pipelines and ML training, leading to faster and more accurate AI models.
- Contributing to the development of scalable and reliable AI systems that can be deployed across various environments.
- Enhancing Synopsys' capabilities in cloud-native and hybrid cloud architecture, improving flexibility and cost-effectiveness.
- Fostering a culture of innovation and collaboration within the AI and ML engineering teams.
What You’ll Need:
- BS/MS/PhD in Computer Science/Software Engineering or an equivalent degree.
- 8+ years of experience in building systems software, enterprise software applications, and microservices.
- Expertise in programming languages such as Go and Python.
- Experience in building highly scalable REST APIs and event-driven software architecture.
- In-depth knowledge of Kubernetes, including deployment on-premises and working with managed services (AKS/EKS/GKE).
- Strong systems knowledge in Linux Kernel, CGroups, namespaces, and Docker.
- Experience with at least one cloud provider (AWS/GCP/Azure).
- Proficiency in using RDBMS (PostgreSQL preferred) for storing and queuing large datasets.