Site Reliability Engineer

Gemraj Technologies Ltd

India 10 Years Exp Posted 556d ago

Job Description

MUST HAVE EXP :

 

  • Candidates should have moved from Devops to ML ops
  • Candidates who are on GEN AI with strong ML ops would also be a fit but must have prior DevOps exp
  • Candidates with ETL data pipelines with ML ops would also fit the role
  • Strong Python knowledge is a must for this role and should be an individual contributor

 

 

About the role:

Turing is looking for people to join us in building ML platforms for our Fortune 500 customers. You will be a key member of the Turing GenAI delivery organization heading a team of other Turing engineers across different skill sets.

Required skills

  • 10+ years of professional experience in building applications using cloud services. Prior experience in building Machine Learning platforms using cloud services.
  • Cloud expertise: Deep knowledge of cloud platforms like AWS, Google Cloud Platform, or Azure, including their machine learning and data services (Azure preferred).
  • DevOps skills: Experience with CI/CD pipelines, infrastructure as code, and containerization technologies like Docker and Kubernetes.
  • Machine learning knowledge: Understanding of ML workflows, model training, and deployment processes.
  • Data engineering: Familiarity with data pipelines, ETL processes, and data storage solutions.
  • Software engineering: Strong programming skills, particularly in languages commonly used in ML like Python.
  • System design: Ability to architect scalable, reliable systems that integrate various services.
  • Automation: Expertise in automating workflows and processes across the ML lifecycle.
  • Security and compliance: Knowledge of best practices for securing ML pipelines and ensuring regulatory compliance.
  • Monitoring and logging: Experience setting up monitoring and logging for ML systems.
  • Collaboration: Ability to work with data scientists, software engineers, and other stakeholders.

Roles & responsibilities

  • Evaluate and select appropriate cloud services for each stage of the ML lifecycle
  • Design and implement the overall architecture of the MLOps platform
  • Set up automated pipelines for data preparation, model training, and deployment
  • Implement version control for code, data, and models
  • Ensure the platform is scalable, secure, and compliant with relevant regulations
  • Provide tools and interfaces for data scientists to easily leverage the platform
  • Continuously optimize the platform for performance and cost-efficiency
  • This role is crucial in bridging the gap between data science and operations, enabling organizations to efficiently develop, deploy, and maintain machine learning models at scale.

Similar Openings for You