Cloud Platform Engineer
equifax
Job Description
What you’ll do:
-
Platform Maintenance:
-
Monitor the health and performance of our GCP-based machine learning infrastructure, including compute instances, storage, and networking.
-
Troubleshoot and resolve issues related to resource allocation, deployment, and configuration of ML models and pipelines.
-
Collaborate with DevOps teams to implement automated deployment and testing processes for machine learning solutions.
-
Incident Management:
-
Triage and resolve support requests related to our ML platform infrastructure.
-
Perform root cause analysis to identify and prevent future problems.
-
Develop and maintain documentation on incident resolution procedures.
-
Performance Optimization:
-
Investigate and address performance bottlenecks in our ML environment on GCP.
-
Implement monitoring and alerting systems to proactively identify potential issues.
-
Collaborate with data science teams to optimize resource utilization and cost efficiency.
-
Platform Upgrades and Enhancements:
-
Stay up-to-date on new GCP services and releases relevant to machine learning.
-
Plan and execute platform upgrades and enhancements to support evolving ML needs.
-
Work with data science and engineering teams to assess the impact of new technologies and services on existing workflows.
What experience you need:
-
BS degree in Computer Science or related technical field involving coding with
-
7+ years of experience in Cloud Platform Engineer with good knowledge on VI Platform Engineering
-
Strong knowledge of GCP services, particularly those related to machine learning (e.g., Compute Engine, Kubernetes Engine, Cloud Storage, BigQuery).
-
Proficiency in Python and experience with scripting languages.
-
Experience with containerization (e.g., Docker) and orchestration technologies (e.g., Kubernetes).
-
Familiarity with cloud monitoring and logging tools (e.g., Cloud Monitoring, Cloud Logging).
-
Experience with DevOps practices and tools (e.g., CI/CD pipelines, Git).
-
Ability to quickly diagnose and resolve complex technical issues.
-
Strong analytical and troubleshooting skills.
-
Proactive approach to identifying and preventing potential problems.
-
Ability to effectively communicate technical concepts to both technical and non-technical stakeholders.
-
Excellent written and verbal communication skills.
-
Ability to collaborate effectively with cross-functional teams.