Senior DevOps Engineer
Cognizant
Job Description
Responsibilities
- Apply technical knowledge and problem-solving methodologies to projects of moderate scope with a focus on improving the data and systems running at scale and ensures end to end monitoring of applications
- Resolves most nuances and determines appropriate escalation path
- Build support Monitor and Automate web product on Private Cloud infrastructure
- Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team
- Drive initiatives to improve the reliability and stability of web Hosting platforms using data-driven analytics to improve service levels
- Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
- Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology related bottlenecks in your areas of expertise
- Collaborates with technical experts key stakeholders and team members to resolve complex problems
- Provides comprehensive and ongoing guidance tools and solutions to support the firms growth
- Works toward becoming an expert on the applications and platforms under your influence while understanding their interdependencies and limitations
- Documents and shares knowledge within your organization via internal forums and communities of practice
- Strong knowledge of one or more infrastructure disciplines such as hardware networking terminology databases storage engineering deployment practices integration automation scaling resilience and performance assessments
- Experience with multiple cloud technologies with the ability to operate in and migrate across public and private clouds
- Drives to develop infrastructure engineering knowledge of additional domains data fluency and automation knowledge
- Cloud Exposure - Understanding and working experience and understanding of resiliency scalability observability monitoring etc
- Understanding of the Data Objects & Structure and write the queries using SQL based on tickets as needed
- Experience as SRE in complex and mission critical applications involving multitude of components of varying technical generations
- Deep proficiency in reliability scalability performance security enterprise system architecture toil reduction and other site reliability best practices with the ability to implement these practices within an application or platform
- Strong knowledge in site reliability culture and principles with demonstrated ability to implement site reliability within an application or platform
- Strong knowledge and experience in observability monitoring alerting and telemetry collection using tools such as Cloudwatch Grafana Dynatrace Prometheus Splunk etc
- Fluency in at least one programming language such as Python Terraform Ansible Java Spring Boot Shell Scripting DotNet etc