Software Engineer II, Application SRE

myworkdayjobs

India 2 Years Exp Posted 98d ago

Job Description

This role would report to leader in Platform Engineering & Site Reliability Engineering
Builds resilient cloud systems, automates operations, and ensures high reliability through robust monitoring and performance optimization
Leading incident response and fostering a collaborative SRE culture. They drive team evolution and proactively prevent issues to maintain optimal service levels.

What you'll do

Assist in building and maintaining highly available and fault-tolerant applications
Support the setup and maintenance of monitoring, logging, and alerting systems to enable proactive issue detection and faster resolution.
Contribute to automation efforts for infrastructure provisioning, configuration, and deployment to improve operational efficiency and reliability.
Help identify and troubleshoot performance issues, monitor system health, and support defining and tracking SLIs/SLOs under senior guidance.
Participate in incident response activities, assist in documenting post-incident reviews, and help implement preventive measures to improve reliability.
Collaborate with cross-functional teams to promote SRE practices and continuous improvement in operations.

Who you will work with

Co-Founders and HODs
Engineering Teams
External customers/MNOs and vendors

What we are looking for

2–5 years of experience in Site Reliability Engineering within the telecom or cloud infrastructure domain, focusing on ensuring high availability and reliability of critical business applications.
Support the implementation of SRE best practices, including incident management, monitoring, and automation, to improve system performance and resilience.
Assist in designing and maintaining observability solutions using tools such as Prometheus, Grafana, New Relic, and Dynatrace for proactive monitoring and alerting.
Participate in incident response and root cause analysis (RCA) activities, contributing to post-mortem reviews and documentation for continuous improvement.
Contribute to performance optimization, including capacity analysis, load testing, and tuning of system components under guidance from senior engineers.
Support automation initiatives for infrastructure and deployments using Terraform, Ansible, Helm, and Kubernetes, ensuring consistency and efficiency in delivery.
Work with AWS, GCP, or OCI environments, assisting in building and maintaining cloud-native and hybrid architectures.
Partner with cross-functional teams in development, operations, and security to promote a culture of reliability, scalability, and observability.
Hands-on experience with project management tools such as JIRA, and a solid understanding of product lifecycle and agile methodologies.
Strong analytical, troubleshooting, and problem-solving skills with a passion for learning and continuous improvement.

Software Engineer II, Application SRE

Job Description

Similar Openings for You

Senior Data Engineering Analyst

Data Platform Engineering Lead - AITDS

Data Engineer Senior Consultant

Sr.Data Engineer