SRE Observability Architect

virtusa

Bangalore 12 Years Exp Posted 619d ago

Job Description

Experience for job opportunity:

• Minimum 10 years of relevant work experience with monitoring setup using any product (Dynatrace, Datadog, ELK stack, Splunk, Grafana/Prometheus, etc.) set up in critical production environments.
• Minimum 5-6 years of work experience in end-to-end observability covering technical, user experience and business outcome metrics. Experience with AIOps is an advantage.
• Has experience working with private cloud and Cloud-native public-cloud (particularly AWS) hosted applications.
• Multi-tenancy setup and data segregation on the observability and AIOps stack.
• Designing and building an Observability & Maintenance (O&M) module for multi-tenant solutions.
• Defining SLIs and setting up SLOs for multi-tenant solutions.

Core Capabilities:

• Experience in implementing Container, Network, APM, RUM, Log Analytics, end-to-end tracing, and custom alerts with Grafana, Prometheus, Grafana Loki (alternatively Logstash or Fluent bit). Implementing the same on any other 3rd party product like Dynatrace is also considered.
• Proficiency with containers and multi-tenancy setup for the observability solution is critical.
• Ability to configure custom alerts, monitors and build AIOps workflows based on telemetry.
• Good understanding of setting up integration capabilities with other systems via APIs and consuming external APIs for IAM as well as ingesting metric-based telemetry via collectors.
• Ability to build custom observability dashboards across different portfolios and personas.
• Setting up Synthetic Monitoring and Test Automation while integrating its telemetry into the observability stack.
• Tenant and data segregation as well as ability to obfuscate sensitive information on the common observability schema.
• Ability to code is preferable – Python / Java and Ansible scripting preferred.

Qualification:

• Observability Foundation certification from DevOps Institute or any product-level accreditation.
• Any recognized System Architecture qualifications (e.g. TOGAF) are a bonus.
Role & Responsibilities:
• Architect, design and ensure Implementation of the entire observability solution to be packaged as a module in a multi-tenant private cloud solution.
• Implement observability solution to monitor and apply the same feature-set across all tenants (monitor and act upon telemetry from tenants – serving as a hypervisor).
• Design and implement integrations as well as externalize APIs.
• Set up authentication and authorization controls by integrating with an IAM layer.
• Work with UI/UX teams to design dashboards for the Observability & Maintenance platform for both the tenants as well as the host.
• Design and set up an AIOps module responsible for automated remediation workflows such as capacity scaling, container restarts, anomaly detection, etc.
• Work on building Proof-of-Concept solutions to view end-to-end tube-maps / service flows for the respective tenant’s services.
• Defining and setting up a CMDB to serve as a source for the infrastructure and application telemetry.
• Work with other teams to ensure the system is well-tested and scalable, meeting tenant demands.
• Define business aligned SLIs and set SLOs for core services and journeys.

Similar Openings for You