Site Reliability Engineer
citi
Job Description
Responsibilities:
- Work in our SRE team to design, implement and maintain monitoring solutions using tools like Grafana, Kibana, Prometheus and App dynamics.
- Contribute to the projects and sprints related to firming up of SLOs/SLIs for Developer pipeline applications ensuring high availability and performance.
- Look at proactively identifying bottlenecks in the application performance and bring your findings to Sprint stand-up; helping to implement solutions.
- Harness Python/Javascript to come up with scripts to automate and streamline operational tasks.
- Take ownership of problems and work with the larger team in driving resolution. Bias for action is preferred and upskilling encouraged.
- Analyze system logs and metrics to identify root cause of issues
- Effectively communicate technical details to both technical and non-technical audiences.
- Work collaboratively with development, operations, and other teams to ensure alignment and smooth execution.
- Contribute to a culture of continuous improvement and knowledge sharing.
Being an SRE, you will –
- Get exposed to cutting edge and latest technology in use by a market leader by Citi like - Tekton, Harness involving OpenShift.
- Get hands on experience of Gen AI, developing use cases from idea inception to product delivery.
- Work with agile tools to manage tasks.
- Learn ground-up processes for building observability for our supported applications and be part of the team that designs SLO/SLIs for improving the application performance.
Skills Required:
- Proficiency in Python, Javascript or willingness to learn
- A good understanding of the Software development lifecycle and Pipeline management. Working experience is an added advantage.
- Basic understanding of observability principles and SLO/SLIs
- Experience in using monitoring tools like Grafana, Kibana, Prometheus, and AppDynamics.
- Basic working knowledge of data visualization tools like Tableau.
- Understanding of Agile concepts and related process.
- Working or theoretical knowledge on opensift, Tekton and Harness pipelines
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.