Site Reliability Engineer
FIS
Job Description
What you will be doing:
- Act as a hybrid engineer, balancing responsibilities across software development and site reliability engineering.
- Design, develop, and maintain Mainframe applications using COBOL, JCL, SORT, and related technologies.
- Monitor, maintain, and improve the reliability and performance of mainframe systems and applications.
- Develop and implement automation tools for system health checks, incident response, and performance tuning to streamline operations and reduce manual intervention.
- Collaborate with cross-functional (development, infrastructure, and operations) teams to design and develop proactive engineering solutions that enhance system scalability, reliability and resilient systems.
- Monitor and improve system performance, availability, and scalability across card management platforms.
- Perform root cause analysis of production incidents and drive postmortem processes.
- Drive incident response, root cause analysis, and continuous improvement initiatives. Participate in on-call rotations and ensure rapid incident resolution.
What you bring:
- 7 to 10 experiences in working on Site Reliability Engineer (SRE) with deep expertise in Mainframe based technologies
- Good Experience in COBOL, JCL (HP JCL knowledge added advantage),VSAM, File-Aid or Insync, Xpeditor, ENDEVOR etc
- Good to have experience in IMS DB/DC and DB2
- Ability to apply software development practices to operational challenges.
Added bonus if you have:
- Certifications in SRE, ITIL, or Mainframe technologies.
- Proven experience in Site Reliability Engineering, including automation, monitoring, and performance tuning
- Experience with observability tools and incident management frameworks.
- Excellent problem-solving and analytical skills with a pragmatic approach to engineering.
- Proficiency in automation and scripting tools.
- Experience with monitoring tools (e.g., Splunk, IBM OMEGAMON etc.).