Senior Site Reliability Engineer
procore
Job Description
What you’ll do:
-
Collaborate with your peers to envision, design, and develop solutions in your respective area with a bias toward reusability, toil reduction, and resiliency
-
Surface opportunities across the broader organization for solving systemic issues
-
Use a collaborative approach to make technical decisions that align with Procore’s architectural vision
-
Partner with internal customers, peers, and leadership in planning, prioritization, and roadmap development
-
Develop teammates by conducting code reviews, providing mentorship, pairing, and training opportunities
-
Serve as a subject matter expert on tools, processes, and procedures and help guide others to create and maintain a healthy codebase
-
Facilitate an “open source” mindset and culture both across teams internally and outside of Procore through active participation in and contributions to the greater community
-
Design, develop, and deploy scalable and reliable backend software systems using languages such as Java, Python, or Go
-
Work with engineering teams to design and implement microservices architecture
-
Develop and maintain APIs using RESTful APIs, GraphQL, or gRPC
-
Ensure high-quality code through code reviews, testing, and continuous integration
-
Serve as a subject matter expert in a domain, including processes and software design that help guide others to create and maintain a healthy codebase
What we’re looking for:
-
Container orchestration (Kubernetes) K8s, preferably EKS.
-
ArgoCD
-
Terraform or similar IaC
-
o11y (OpenTelemetry ideal)
-
Public cloud (AWS, GCP, Azure)
-
Cloud automation tooling (e.g., CloudFormation, Terraform, Ansible)
-
Kafka and Kafka connectors
-
Linux Systems
-
Ensure compliance with security and regulatory requirements, such as HIPAA, SOX, FedRAMP
-
Experience with the following is preferred:
-
Continuous Integration Tooling (e.g., Circle CI, Jenkins, Travis, etc.)
-
Continuous Deployment Tooling (e.g., ArgoCD, Spinnaker)
-
Service Mesh / Discovery Tooling (e.g., Consul, Envoy, Istio, Linkerd)
-
Networking (WAF, Cloudflare)
-
Event-driven architecture (Event Sourcing, CQRS)
-
Flink or other streaming processing technologies
-
RDBMS and NoSQL databases
-
Experience in working and developing APIs through REST, gRPC, or GraphQL
-
Professional experience in Java, GoLang, Python preferred