Specialist Engineer III
verizon
Job Description
What you’ll be doing...
The work you'll be doing is to support mission and business-critical applications within Verizon. You will be ensuring the application is available to the customers 100%. You will also work as a team member on various projects in monitoring, maintaining and improving the application availability and stability under the mentorship of a technical lead who would be supporting you on delivery
-
Developing and maintaining performance metrics for platforms like NewRelic, DynaTrace APM tools and ElasticSearch to ensure high availability and quick troubleshooting.
-
Conducting regular platform health checks and optimizations to prevent downtime and enhance user experience.
-
Collaborating with engineering and support teams to establish response protocols for incidents and performance issues.
-
Interacting with Dev/QA teams to identify RCA and re-instrument triggers to prevent future application degradation and outages.
-
Designing and implementation of CI/CD pipelines using Jenkins, GitLab, and Artifactory to support agile and DevOps workflows.
-
Tracking and analyzing CI/CD pipeline performance to identify bottlenecks and optimize for speed and reliability.
Where you'll be working...
This hybrid role will have a defined work location that includes work from home and assigned office days as set by the manager.
What we’re looking for... You are curious about new technologies and the possibilities they create. You enjoy the challenge of supporting applications while exploring ways to improve upon the technology. You are driven and motivated, with good communication and analytical skills.
You'll need to have:
-
Bachelor’s degree or three or more years of work experience.
-
Three or more years of hands-on experience as SRE Engineer.
-
Strong experience in maintaining AWS cloud infrastructure.
-
Mandatory to have an AWS associate level certification.
-
Mandatory to have a ITIL certification.
-
Mandatory working experience on the Java Microservices and spring boot based architecture.
-
Experience with SRE best practices and meeting the system availability by defining the proper SLI, SLO and SLA, Error rates and reducing the toil.
-
Strong knowledge on system monitoring, Incident response, performance tuning to meet the MTTR. Capacity planning, scaling of the application infrastructure as part of the risk mitigation strategies to avoid any performance bottlenecks.
-
Experience with infrastructure automation and container orchestration tools - CloudFormation, Ansible, Docker, Kubernetes, Helm etc.
-
Deep working knowledge on Linux servers and networking.
-
Good knowledge on NGINX setup and proxy configuration on web server.
-
Understanding of ITIL best practices for the Problem, Incident and Change management for the critical production applications.
-
Knowledge on any one scripting language - Shell, Groovy or Python.
-
Good understanding of concepts related to computer architecture, data structures and programming practices.