Senior Site Reliability Engineer

okta

Bengaluru 5 Years Exp Posted 486d ago

Job Description

What you’ll be doing:

Design, build, maintain, and deploy robust tools and pipelines to automate infrastructure provisioning, configuration, and deployment across multiple cloud environments (AWS, GCP, etc.);
Accountable and responsible for the set-up, maintenance, and ongoing development of artifactory application suite: artifactories, globalization, disaster recovery
Create and maintain fully automated CI build pipelines for multiple services.
Architect, implement, and manage highly available and scalable cloud-native platforms leveraging Kubernetes, Linux, and other cutting-edge technologies;
Develop and manage efficient multi-cloud deployment strategies, ensuring seamless application and infrastructure orchestration across diverse cloud environments;
Create and maintain custom Amazon Machine Images (AMIs) tailored to specific application and infrastructure needs, optimizing performance and security;
Collaborate with engineering and operations teams to ensure the reliability, performance, and security of production systems;
Respond promptly to production incidents, troubleshoot complex issues, and implement preventive measures to minimize future disruptions;
Build scalable and extensible platforms, services, and tools using Java, Python, Go, and other relevant technologies, with a focus on automation, reliability, and security;
Identify and eliminate bottlenecks, manual processes, and inefficiencies, implementing automated solutions to improve operational efficiency and reduce human error;
Leverage industry best practices in infrastructure, automation, and orchestration to drive innovation and explore emerging technologies that can enhance the platform's capabilities;
Develop self-service tools and processes to empower teams to independently manage their infrastructure, reducing reliance on manual intervention; and
Prioritize security and compliance by maintaining up-to-date base images, applying security patches, and implementing robust security measures to protect sensitive information and systems.

What we are looking for:

5+ years of experience with Java, Go, Python, or similar backend languages
5+ years of experience building, maintaining, and debugging services, internal tools, and frameworks
5+ years of experience automating and deploying large-scale production services in AWS, GCP, or similar
5+ years experience managing CI/CD infrastructures, with a strong proficiency in Tools like Spinnaker, Jenkins, ArgoCD, Gitlab or any CI/CD to streamline deployment pipelines and ensure efficient software delivery.
Strong understanding of Kubernetes fundamentals, cluster administration, and container orchestrationIn-depth knowledge of Linux systems, including system administration, shell scripting, and security best practices
In-depth knowledge of Artifactory, or other storage & replication service like EKR/GCR
Experience in designing, building, and managing complex deployment pipelines across multiple cloud providers
Expertise in creating, configuring, and managing custom AMIs for various workloads and environments

Senior Site Reliability Engineer

Job Description

Similar Openings for You

Senior Quality Assurance Engineer

QA Engineer

Manual Test Lead

Senior Quality Assurance Analyst