DevOps Domain SRE
citi
Job Description
Primary Responsibilities:
- Identify key business critical flows and provide Core SRE recommendation to ensure 100% availability of applications from UAT to PERF and PROD.
- Enable Production management processes in non-production environment to provide environment stability
- Serve as a Domain SRE to support the Problem management, risk management and Change management , CI/CD enablement pipeline for SRE function identified.
- Partner with other DevOps and Engineering support teams to improve MTTR, MTTM, Other Operation Efficiency Targets and organizational Service Level Agreements.
- Improve overall resilience and stability of non-production environment by implementing Monitoring and Observability Practices for anything and everything in the system.
- Identifies and leads the implementation of Service Automation to reduce cost, reduce risk, improve efficiency and enable Service Management to keep up with the ever-increasing volume of with fast pace of newer technologies.
- Work with USPB and Wealth DevOps Teams and App Dev Teams to conduct blameless post-mortem of major incidents, develop executive briefings, assess major incident impacts and drive service improvements to prevent repeat of an incident
- Execute Robust Service Readiness process. Identify and classify the risks in the non-production estate, work with DevOps team members to create Service Improvement plans and drive them to closure.
- Prepare SRE Data Visualization Reports and Metrics for Management and Represent DevOps team in Weekly/Monthly Operation Review Meetings.
- Adopt AIOps automation implemented by Production management.
Qualifications:
- 3-6 years development or production support experience with North America Consumer applications.
- Certified SRE Foundation Engineer by DevOps Institute (PeopleCert)
- Solid foundation and understanding of ITIL Concepts.
- Engineering Background in system admin, development, DevOps or equivalent field, preferably with experience in Distributed Consumer applications.
- Experience/ familiarity with automation technologies, advanced analytics and predictive modeling using AI/ML.
- Experience in programming in one of the following languages unix shell scripting, Java, Python etc.
- Strong knowledge in Monitoring and Logging/Traceing Tools like Splunk, AppDynamics, ELK Kibana, Grafana.
- Knowledge of Microservices Architecture and Containerization Technologies like OpenShift, Kubenetes and Google Apigee.
- Knowledge/Certification in Public Cloud Implementation (AWS or Google Cloud) will be a plus.