AI/ML DevOps Engineer, AS

db

pune 4 Years Exp Posted 24d ago

Job Description

Your key responsibilities

  • Manage Incident, Service, Problem and Change Management of Shared AI Platforms
  • Monitor production AI/ML models for performance, latency, accuracy, data drift and model drift, and proactively troubleshoot production issues.
  • Automate Model Packaging, versioning and rollbacks.
  • Monitor model inference speed, latency and accuracy.
  • Optimize resource allocation for cost-effective AI workloads.
  • Detect and mitigate data drift affecting model performance.
  • Troubleshoot model failures, latency issues and deployment errors.
  • Collaborate with L3 engineers and data scientists for escalations.
  • Utilize containerization technologies like Docker to package models and dependencies.
    Continuous Integration/Continuous Deployment (CI/CD):
    • Develop and maintain CI/CD pipelines for automating the testing, integration, and deployment of ML models.
    • Implement version control to track changes in both code and model artifacts.
      Monitoring and Logging:
    • Establish monitoring solutions to track the performance and health of deployed models.
    • Set up logging mechanisms to capture relevant information for debugging and auditing purposes.
  • Optimize ML infrastructure for scalability and cost-effectiveness.
  • Implement auto-scaling mechanisms to handle varying workloads efficiently.
  • Enforce security best practices to safeguard both the models and the data they process.
  • Ensure compliance with industry regulations and data protection standards.
  • Oversee the management of data pipelines and data storage systems required for model training and inference.
  • Implement data versioning and lineage tracking to maintain data integrity.
  • Collaborate with DevOps teams to align MLOps practices with broader organizational goals.
  • Continuously optimize and fine-tune ML models for better performance.
  • Identify and address bottlenecks in the system to enhance overall efficiency.
  • Maintain clear and comprehensive documentation of MLOps processes, infrastructure, and model deployment procedures.
  • Document best practices and troubleshooting guides for the team.

 

Your skills and experience

  • Excellent communication and presentation skills, highly organized and disciplined.
  • Experienced in working with multiple stakeholders. Ability to create and naturally maintain good business relationships with all stakeholders.
  • Comfortable working in VUCA (Volatility Uncertainty Complexity Ambiguity) and highly dynamic environments.
  • Expertise on the products/technologies below is required:
    • Google Cloud – GKE, Terraform, IAM, BigQuery, Cloud Shell, Cloud Storage
    • AI/ML – AI Agents, AI concepts, ML models, AI/ML Concepts, Vertex AI, AutoML, BigQuery ML.
    • MLOps & CICD Pipelines, Kubeflow, Vertex AI pipelines
    • Proficiency in Designing, deploying and managing AI agents e..g chatbot, virtual assistants
    • GCP Networking, Networking protocols, Security concepts, VPC, Load balancers
  • Unix servers very basic administration
  • Python, Shell Scripting, SQL
  • Familiarity with fine-tuning and deploying large language models on GCP.
  • Understanding of security best practices, including data governance, encryption, and compliance with AI-related regulations.
  • GCP - Cloud Logging, Cloud Monitoring and AI Model Performance Tracking.
  • 4+ years of work experience in IT; (for AVP – 6+, Associate – 4+)
  • Strong problem-solving skills and a passion for AI research
  • Good inter-personal skills with ability to co-operate and collaborate with other teams

 

Educational Qualifications:

  • B.E. / B. Tech. / master’s degree in computer science or equivalent
  • Added advantage. –
    • GCP Certifications
    • Kubernetes Certifications
    • AI/Ml Educational background or Certifications or higher qualifications.

 

How we’ll support you

  • Training and development to help you excel in your career
  • Coaching and support from experts in your team
  • A culture of continuous learning to aid progression
    • A range of flexible benefits that you can tailor to suit your needs

Similar Openings for You