Site Reliability Engineer
morningstar
Job Description
Responsibilities
- Lead the corporate operations management initiatives based on best practices such as CI/CD, monitoring everything, infrastructure automation, operations readiness review.
- Build work class data operations by establishing ITIL deployment, problem, incident management and continuous improvement processes.
- Provide technical triage and troubleshooting by understanding and analyzing financial data systems.
- Support data systems request fulfillment such as access management, ETL configurations.
- Lead miscellaneous operation projects across teams such as DR, security patching, AWS resources management.
- Drive automation and innovation for proactive and continuous operations improvement by new technology research and tools development.
- Be a focal communication contact to collaborate with our oversee offices for projects, knowledge transfer and on-call rotation.
Requirements
- 3+ years of experience in Site Reliability Engineering, DevOps, or related fields.
- Proven experience supporting and managing AWS infrastructure.
- Hands-on experience with containers (Docker, Kubernetes) and orchestration tools.
- Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI, Harness).
- Strong scripting skills (Python, Bash, or other languages) for automation tasks.
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
- AWS Certified Solutions Architect – Associate or Professional.
- AWS Certified DevOps Engineer – Professional (preferred).
- Other relevant certifications (e.g., Docker, Kubernetes, Terraform) are a plus.
- Strong understanding of networking concepts and protocols (DNS, TCP/IP, HTTP/S).
- Excellent problem-solving skills and the ability to troubleshoot complex systems.
- Strong communication skills to interact with technical and non-technical stakeholders.
- Ability to work in a fast-paced environment and manage multiple priorities.