SRE Engineer
IBM
Job Description
Your Role and Responsibilities
- Run the production environment by monitoring availability and taking a holistic view of system health
- Provide primary operational support and engineering for IBM infrastructure
- Create sustainable systems and services through automation and uplifts
- Improve reliability, quality, and time-to-market of our suite of cloud solutions
- Provide support for production escalations and problem resolution for customers
- Proactively identifying issues and improvement opportunities
- Diagnose and resolve complex system, application software, security and related problems that impact system and availability
- Gather and analyze metrics from production systems to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Understand business needs to define automation requirements and product architectural solutions
- Develop high-level product specifications with attention to system integration and feasibility
- Define all aspects of development from appropriate technology and workflow to coding standards
- Collaborate with other professionals to determine functional and non-functional requirements for automation software
- Participate in technical reviews of requirements, specifications, designs, code and other artifacts
- Learn new skills and adopt new practices readily in order to develop innovative and cutting-edge software products that maintain Company’s technical leadership position
Required Technical and Professional Expertise
- 8+ years of experience on Software Industry
- Ability to program with one or more high level languages, such as Python
- Experience in Cloud services and technologies like VPC, Gateways, NACL, security group.
- Experience in Network debugging and Network routing protocols such as BGP, ISIS and others
- Experience in DevOps and Site Reliability Engineering
- Understanding of Microservice Architecture, Docker, Kubernetes, and other cloud native technologies
- Debugging/Monitoring knowledge of Cloud Native Applications using Devops Tools such as Prometheus, NewRelic, Instana and others.
- Good to have understanding on Devops Lifecycle and associated tools such as Git, CICD tools like Jenkins, Tekton, Travis and others
- Understanding of Cloud Computing (IAAS, PAAS, SAAS) and Security Principles
- Understanding of software quality assurance principles
- A technical mindset with great attention to detail
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- Outstanding communication and presentation abilities