Senior Site Reliability Engineer
Microsoft
Job Description
Required/Minimum Qualifications:
- 6+ years technical experience in software engineering, network engineering, or systems administration
- OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
- OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.
- 5+ years of hands-on experience developing and supporting infrastructure services for AI or cloud platforms.
- Proven ability to modify componentized, well-architected infrastructure software and collaborate across teams.
- 1+ years experience with incident management and reliability engineering in cloud or AI environments.
- Excellent interpersonal, communication, and collaboration skills.
Other Requirements:
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Additional or Preferred Qualifications:
- 7+ years technical experience in software engineering, network engineering,
- OR systems administration
- OR Bachelor's Degree in Computer Science, Information Technology,
- OR related field AND 4+ years technical experience in software engineering, network engineering,
- OR systems administration
- OR Master's Degree in Computer Science, Information Technology,
- OR related field AND 3+ years technical experience in software engineering, network engineering
- Experience in distributed systems and/or cloud platforms (Azure, Kubernetes, Docker, containers ecosystem).
- Experience with GPUs, InfiniBand, or similar high-performance technologies.
- Proficiency in RDMA (Remote Direct Memory Access), MPI (Message Passing Interface), and high-performance computing architecture.
- Proficient in scripting (PowerShell, Shell script, etc.) and deep expertise in Linux.