Staff Site Reliability Engineer
okta
Job Description
Job Responsibilities:
- Work with various teams to design and implement scalable, and reliable network solutions
- Maintain a highly available cloud infrastructure edge for the Okta identity platform
- Collect and analyze data to identify root causes for network-specific events
- Automate AWS infrastructure with Terraform and/or Chef
- Evolve the system by introducing changes to improve efficiency, scalability, and velocity
The ideal candidate is a self-starter who takes pride in designing and implementing durable solutions to network problems. They are passionate about network responsiveness and performance.
Required knowledge and skills:
- 5+ years experience in a Cloud Network Engineer role or related
- Demonstrated in-depth understanding of TCP/IP networking stack; (layer 2 through 7). Ability to implement a highly available VPC network, including inter-vpc connectivity. Working knowledge of stateless and stateful firewalls. Familiar with DNS, web-application firewalls, and various load balancing methods available in the cloud.
- Deep knowledge of AWS network concepts such as Transit Gateway, Site to Site VPN, and Direct Connect
- Ability to troubleshoot network issues using AWS vpc flow logs and cloudwatch metrics, as well as analyzing standard packet captures.
- Experience working with Terraform, Ansible, Chef, Puppet or similar automation tools
- Able to collaborate effectively with multiple stakeholders