Senior Software Engineer(AI/ML Platform)

myworkdayjobs

pune 5 Years Exp Posted 50d ago

Job Description

Responsibilities

  • Design and Implement Scalable AI/ML Serving Systems: Develop scalable and efficient systems for serving AI/ML models, ensuring that these systems can handle varying loads and perform with low latency across diverse environments

  • Hybrid Cloud Architecture Management: Architect and manage a hybrid cloud environment that uses both on-premises resources and multiple cloud platforms (e.g., AWS, Azure, GCP) to optimise performance, cost, and scalability

  • Model Deployment and Versioning: Oversee the deployment of AI/ML models into production, including the setup of CI/CD pipelines for model deployment and versioning, ensuring smooth and reliable model updates and rollbacks

  • Performance Monitoring and Optimization: Implement monitoring tools and practices to track the performance of AI/ML models in production, identifying bottlenecks and optimizing system and model performance for better efficiency and reduced costs

  • Security and Compliance: Ensure that the AI/ML serving systems follow industry standards and regulatory requirements for data security and privacy, including the management of data encryption, access controls, and audit trails

  • Collaboration and Leadership: Work closely with AI/ML researchers, data engineers, and other partners to translate complex AI/ML models into production-ready systems, providing technical guidance throughout the project lifecycle

  • Research and Innovation: Stay informed about the latest developments in AI/ML technologies, cloud computing, and software engineering practices, exploring and integrating solutions that can enhance the capabilities and efficiency of the AI/ML serving platform

 

Minimum Qualifications

  • Educational Background: BS or MS in Computer Science, or equivalent practical experience

  • Experience: 5+ years of experience in software development and engineering, with a solid record of delivering production systems and services

  • Expertise in AI/ML Technologies: Hands-on experience with AI/ML frameworks (such as TensorFlow, PyTorch) and familiarity with the lifecycle of AI/ML model development, from training to deployment

  • Proficiency in Programming Languages: Strong coding skills in languages commonly used in AI/ML and system development, such as Python

  • Experience with Cloud Technologies: Experience with designing and managing systems on hybrid cloud architectures, including working knowledge of cloud service providers like Azure

  • Knowledge of Containerization and Orchestration Tools: Familiarity with containerization technologies (e.g., Docker) and orchestration systems (e.g., Kubernetes), crucial for deploying and scaling applications in a cloud environment

  • Understanding of DevOps Practices: Knowledge of CI/CD pipelines, infrastructure as code, and other DevOps practices to ensure smooth deployment and operation of AI/ML systems

  • System Performance Optimization: Deep understanding of performance metrics and latency optimization techniques, with the ability to diagnose, tune, and enhance the efficiency of serving systems

 

Preferred Qualifications

  • Cloud Certifications: Certifications in cloud technologies from major providers (AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect, Microsoft Certified: Azure Solutions Architect Expert), indicating a high level of expertise in cloud services and architecture

  • Experience with Big Data Technologies: Experience with big data technologies and ecosystems (Hadoop, Spark, Kafka) for processing and analyzing large datasets in a distributed computing environment

  • AI/ML Model Monitoring Tools: Familiarity with tools and frameworks for monitoring and managing the performance of AI/ML models in production (e.g., MLflow, Kubeflow, TensorBoard)

    • Prior experience of on-call rotation for tier-1 services with 24x7 support mechanism

Similar Openings for You