Site Reliability Engineer
finastra
Job Description
Objectives of this Role
- Work in tandem with our engineering team to identify and implement the most optimal cloud-based solutions for the company.
- Define and document best practices and strategies regarding application deployment and infrastructure maintenance.
- Provide guidance, thought leadership, and mentorship to development teams to build cloud competencies.
- Ensure application performance, uptime, and scale, maintaining high standards of code quality and thoughtful design.
- Managing cloud environments in accordance with company security guidelines.
- Stay current with industry trends, making recommendations as needed to help the organization innovate and excel.
Responsibilities
- Develop, deploy and maintain infrastructure on Azure using Docker and Kubernetes.
- Implement automation tools and frameworks (CI/CD pipelines).
- Collaborate with team members to improve the company’s engineering tools, systems and procedures, and data security.
- Optimize the company’s computing architecture.
- Conduct systems tests for security, performance, and availability.
- Develop and maintain design and troubleshooting documentation.
- Collaborate with the engineering teams to enable their applications to run on Cloud infrastructure.
- Debugging technical issues inside a complex stack involving virtualization, containers, microservices, etc.
- Troubleshoot incidents, identify root cause, fix and document problems, and implement preventive measures.
- Employ exceptional problem-solving skills, with the ability to see and solve issues before they snowball into problems.
Requirements
- Bachelor’s degree in computer science, information technology, or mathematics
- 5+ years of proven experience as a Site Reliability Engineer or similar role in software development and system administration.
- Experience in Docker for containerization and application deployment.
- Experience with Kubernetes and Helm for orchestration of Docker containers.
- Experience with Azure cloud services and understanding of their offerings and architecture.
- Knowledge of databases and operating systems.
- Ability to troubleshoot complex software and hardware issues.
- Knowledge of best practices related to data encryption and cybersecurity.
- Excellent problem-solving and communication skills.
- Experience in network, server, and application-status monitoring.
- Operating systems – any Linux/Unix flavor
- Monitoring – Prometheus, Grafana