Senior DevOps Engineer

thalesgroup

Bangalore 5 Years Exp Posted 376d ago

Job Description

Key Responsibilities:

  • Monitoring & Observability: Design, implement, and maintain sophisticated monitoring, alerting, and logging solutions to ensure the reliability, availability, and performance of our security-focused SaaS platform. Use tools like PrometheusGrafanaDatadog to provide deep visibility into system health, security metrics, and application performance.

  • Incident Management: Respond to and mitigate incidents in real time, ensuring minimal impact on customers. Drive post-mortems and root cause analyses (RCAs) to improve monitoring and response processes.

  • System Reliability: Collaborate with cross-functional teams to define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for both security and performance metrics.

  • Automation & CI/CD Integration: Build automated monitoring and alerting pipelines that integrate seamlessly with CI/CD workflows to catch issues early in development, testing, and production environments.

  • Mentorship & Best Practices: Provide guidance and mentorship to junior DevOps engineers, helping them adopt best practices for monitoring, observability, and security.

  • Optimization & Continuous Improvement: Continuously evaluate and refine monitoring tools and practices to adapt to new threats, technologies, and regulatory requirements.

Required Qualifications:

  • 5+ years of experience in DevOps, Site Reliability Engineering, or Infrastructure roles, ideally in cybersecurity or SaaS environments.

  • Strong experience with monitoring tools like PrometheusGrafanaDatadogELKSplunk, or similar observability solutions.

  • Expertise in Linux/Unix-based systems and cloud environments (AWS, GCP, Azure).

  • Proficiency in scripting languages such as PythonBash, or Go to automate monitoring tasks and create custom solutions.

  • Deep understanding of security principles and experience integrating security monitoring into DevOps practices (e.g., SIEM systems, threat detection).

  • Experience with containerization (Docker) and orchestration (Kubernetes) to monitor containerized applications in production.

  • Familiarity with Infrastructure-as-Code (IaC) tools like TerraformAnsible, or CloudFormation to automate infrastructure monitoring setup.

  • Solid problem-solving skills, a keen eye for detail, and a proactive approach to system monitoring and incident response.

Similar Openings for You