Senior Site Reliability EngineerSenior Site Reliability Engineer

okta

Bengaluru 5 Years Exp Posted 610d ago

What you’ll be doing

Designing, building, running, and monitoring Okta's production infrastructure
Be an evangelist for security best practices and also lead initiatives/projects to strengthen our security posture for critical infrastructure
Responding to production incidents and determining how we can prevent them in the future
Triaging and troubleshooting complex production issues to ensure reliability and performance
Identifying and automating manual processes
Continuously evolving our monitoring tools and platform
Promoting and applying best practices for building scalable and reliable services across engineering
Developing and maintaining technical documentation, runbooks, and procedures
Supporting a 24x7 online environment as part of an on-call rotation
Be a technical SME for a team that designs and builds Okta's production infrastructure, focusing on security at scale in the cloud.

What you’ll bring to the role

Are always willing to go the extra mile: see a problem, fix the problem.
Are passionate about encouraging the development of engineering peers and leading by example.
Have experience automating, securing, and running large-scale production Java/Tomcat and containerized services in AWS (EC2, ECS/EKS, KMS, Kinesis, RDS) or other cloud providers.
Experience deploying and managing Kubernetes/K8s clusters (EKS preferred). Experience with monitoring/alerting in the kubernetes eco system, and with deploying microservices
Have deep knowledge of CI/CD principles, Linux fundamentals, OS hardening, networking concepts, and IP protocols.
Have a deep understanding and familiarity with configuration management tools like Chef, Terraform, and Ansible.
Have expert-level abilities in operational tooling languages such as Ruby, Python, Go and shell, and use of source control.
Familiar with industry-standard security tools like Nessus and OSQuery.
Familiar with data stores such as RDS, S3, Redis, Cassandra, and Elasticsearch.

Experience in the following

5+ years of experience architecting and running complex AWS or other cloud networking infrastructure resources
5+ years of experience with Infrastructure As Code such as Terraform, Chef or Ansible;
4+ years of experience with Kubernetes/ K8s;
Strong Linux understanding and experience;
Strong security background and knowledge;
BS In computer science (or equivalent experience).