Site Reliability Engineer(SRE)
gem
Job Description
Responsibilities
- Manage and maintain Kubernetes clusters across cloud platforms, including OpenShift, Amazon EKS, Azure AKS, and Google GKE.
- Implement and manage CI/CD pipelines using tools such as Jenkins, GitHub Actions, Argo CD, or GitLab CI/CD.
- Design and maintain observability stacks with tools including Prometheus, Grafana, Loki, OpenTelemetry, and related technologies.
- Optimize system performance and resolve production issues.
- Implement SRE principles, including Service Level Indicators (SLIs) and Service Level Objectives (SLOs), to uphold system reliability.
- Automate infrastructure and operational tasks using programming languages such as Go or Python, and Infrastructure as Code (IaC) tools like Terraform.
- Apply AI skills like Vibe Coding for engineering tasks, AIOps and automation, understanding of Large Language Models (LLMs) and AI Agents, and proficiency in Prompt Engineering.
- Remain current with emerging technologies, including AI, MLOps, and Edge Computing.
- Contribute to knowledge sharing through technical writing and presentations.
Qualifications
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- 2-5 years of experience in SRE, Platform Engineering, or DevOps roles.
- Strong expertise in Kubernetes, cloud-native technologies, and major cloud platforms (AWS, Azure, GCP).
- Proficiency in programming languages such as Python or Go or Node.js.
- Familiarity with CI/CD tools and contemporary deployment practices.
- Knowledge of observability tools and Infrastructure as Code.
- AI skills, including experience with Vibe Coding, AIOps and automation, understanding of LLMs and AI Agents, and Prompt Engineering.
- CKA Certified (Brownie points!)
- Excellent problem-solving abilities and communication skills.
- Inclination toward open-source contributions is advantageous.
Benefits :
- Competitive salary
- Premium health insurance and various health & wellness benefits
- Opportunity to work on cutting-edge technologies
- Collaborative and supportive work environment
- Chance to make a real impact on the company's success