Principal Site Reliability Engineer
spglobal
Job Description
Responsibilities:
- Oversee AWS VPC, application management, and security.
- Collaborate with development teams to document and implement low-maintenance, cloud-native best practices.
- Develop and deploy automation processes for Infrastructure as a Service (IaaS).
- Create and implement automation processes for the installation and patching of third-party software.
- Design and enhance initiatives and continuous improvement processes focused on application health monitoring, reporting, and technical support.
- Document developed processes and transition them to Application and Operating System SRE teams.
- Provide insights and feedback from an SRE perspective for cloud migrations of legacy applications.
- Plan, build, document, and initiate disaster recovery procedures.
- Coordinate with the Cloud Engineering team for IT design support.
- Collaborate with the IT Security team to enforce security policies.
- Work with the Operations team to support troubleshooting of production incidents and requests.
What we are looking for:
- University Graduate with bachelor’s degree in computer science, related field, or equivalent experience.
- Minimum 8+ years of relevant hands-on experience in related roles.
- Knowledge of Incident management, Problem Management and Change Management.
- Experience with AWS services: VPC, S3, CloudFront, ALB/NLB, Route 53, CloudWatch, EC2 and AWS Security Center.
- Extensive Experience with scripting such as Python, Terraform and Bash.
- Experience with load balancing appliances specifically: Big IP F5, AVI Vantage, and Apache.
- Fundamental knowledge of networking topology and Internet Protocol
- Good analytical and problem-solving skills.
- Strong interpersonal skills – must be able to work effectively as part of a project/program team and foster team cooperation.
- Must be a strong communicator both written and verbally in English.