Product Site Reliability Engineer
invoicecloud
Job Description
Ownership
- Owns reliability, performance, and stability outcomes for assigned product services in production.
- Acts as an incident responder for critical production issues, leading investigation, mitigation, and resolution.
- Takes accountability for live debugging, root-cause analysis, and corrective actions across environments.
- Ensures systems meet defined reliability, scalability, and availability expectations.
Drives Efficiency
- Develops and maintains scalable .NET and C# services with a focus on performance, resilience, and maintainability.
- Implements and improves CI/CD pipelines, deployment automation, and operational workflows.
- Uses performance monitoring tools such as New Relic, ElasticSearch, and AppDynamics to proactively identify and resolve issues.
- Standardizes observability, alerting, and response practices to reduce mean time to detection and resolution.
Results Driven
- Delivers measurable improvements in system uptime, latency, throughput, and incident reduction.
- Ensures production incidents are resolved quickly and do not recur through preventive engineering.
- Performs rigorous code reviews, unit testing, and validation to ensure high-quality, production-ready code.
- Supports reliable releases by validating readiness and monitoring post-deployment behavior.
Innovative
- Designs and implements cloud-native solutions using Azure services, containers, and Kubernetes.
- Builds and optimizes Functions, batch processes, and message queues to support scalable, resilient data processing.
- Applies automation and AI-enabled analysis to accelerate incident diagnostics, performance tuning, and anomaly detection.
- Continuously improves system architecture, tooling, and workflows as platform scale and complexity increase.
Requirements
- Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent work experience
- 5+ years of professional experience in software development with strong expertise in .NET, C#, and VB.NET
- Minimum 3 years of hands-on experience with performance monitoring tools such as New Relic, ElasticSearch, or AppDynamics
- At least 3 years of experience with DevOps practices, including CI/CD pipelines, automated deployments, and infrastructure automation
- Strong experience with Azure cloud services and cloud-native application development
- Hands-on experience with Kubernetes for managing containerized applications and microservices
- Proven ability to perform live debugging and resolve production issues in real time
- Strong experience with code reviews, unit testing, and maintaining high-quality code standards
- Experience designing and implementing Functions, batch jobs, and message queues
- Ability to operate under pressure and take ownership of critical production incidents
- Strong problem-solving and analytical skills
- Excellent written and verbal communication skills with the ability to collaborate across teams
Preferred Qualifications
- Experience working in Agile development environments
- Certifications in Azure, DevOps, or related technologies
- Familiarity with additional cloud platforms or container orchestration tools
- Experience supporting fintech, payments, or other high-throughput transactional systems