Sr Engineer Site Reliability

empower

Bengaluru NM Years Exp Posted 537d ago

Job Description

Job Duties:

•    Responsible for a diverse set of assignments, spanning a wide range of technologies and a high level of complexity
•    Works independently on assignments
•    Work is reviewed for soundness of judgment, overall accuracy and adequacy
•    Guide and implement monitoring in distributed systems, establishing key indicators (SLIs) along the way
•    Lead the design and instrumentation of how metrics are gathered from multiple sources
•    Improve the availability and performance of systems and services
•    Spread Site Reliability principals and knowledge across the organization
•    Play a lead role in ensuring the high availability, resilience, and scalability of containerized applications in production
•    Manage observability within Kubernetes, specifically EKS
•    Improve the availability and performance of systems and services
•    Measure the effectiveness and business impact of the SRE practice
•    Collaborate with development teams to support releases and create highly scalable, resilient, and maintainable services
•    Work in a GitOps driven environment

Qualification:

•    Expert understanding in one or more observability suites and APM tooling such as DataDog, AppDynamics, New Relic, etc.
•    Expert understanding in maintaining high availability and resiliency within AWS infrastructure components, including EKS, EC2, RDS, S3, VPC, and others
•    Strong programming skills in one or more languages such as shell, Go, Python, etc.
•    Expertise with Infrastructure as Code frameworks such as Terraform and CloudFormation
•    Proficiency with containerization and orchestration technologies such as Docker and Kubernetes

  • Good Experience in setting up DevOps pipelines in Jenkins
  • Experience in using GitHub copilot preferred

Similar Openings for You