Asst Manager-SolutionArchitect
ripplehire
Job Description
Design & Implementation
Perform detailed analysis of infrastructure design, architecture, and security standards.
Design and implement comprehensive observability solutions using Datadog, BMC, Nagios, SolarWinds, AppDynamics, and similar platforms.
Develop and maintain architecture blueprints, design documents, and operational runbooks for observability and event management systems.
Lead end‑to‑end deployment, configuration, customization, and integration of observability tools across on‑premises and cloud environments.
Operations & Event Management
- Own day to day operations of observability platforms, ensuring high availability, performance, and data accuracy.
- Manage event management processes, including event ingestion, correlation, noise reduction, and routing to appropriate resolver groups.
- Monitor platform health, perform capacity planning, and execute regular platform maintenance and upgrades.
- Investigate and resolve issues related to monitoring, logging, metrics, dashboards, s, and event correlation.
- Establish and continuously improve ing standards, thresholds, and SLA based response models.
- Coordinate with IT operations, SRE, infrastructure, and application teams to drive proactive incident prevention using observability insights.
- Lead root‑cause analysis RCA efforts and provide actionable insights from telemetry data to improve system reliability.
Collaboration and Continuous Improvement
- Collaborate with business and technical stakeholders to gather monitoring requirements and ensure alignment with organizational objectives.
- Conduct regular assessments and maturity reviews of observability solutions to identify optimization and automation opportunities.
- Stay current with emerging technologies, industry best practices, and evolving trends in observability, monitoring, and AIOps.