SRE
thalesgroup
Job Description
Skills & Competencies
-
At least 3 years of professional experience within a cloud/web/CDN scale infrastructure
-
Experience with Python and Go. C/C++ a plus
-
Expert knowledge of Linux systems, network programming and protocols TCP, UDP, DNS, TLS/SSL, HTTP
-
Experience with BGP and Anycast routing is a plus
-
Experience with DevOps principles and concepts such as Infrastructure as Code (Ansible/Saltstack), CI/CD (Gitlab, Jenkins, Git), monitoring and visualization (Prometheus, Grafana)
-
Experience with big data technologies such as NoSQL/RDBMS, Redis, ElasticSearch, Kafka
-
Experience with containers and container management (Docker, Kubernetes)
-
Experience analyzing and building data telemetry, modeling, pipelines, UI visualization
-
Experience in developing software, troubleshooting, and monitoring large scale distributed systems
Responsibilities
-
Apply SRE core tenets of measurement (SLI/SLO/SLA), eliminate toil, and reliability modeling
-
Enable and educate development teams on industry best practice design patterns, ways of working and operational knowledge to ensure platform continuity
-
Develop and architect solutions to infrastructure and operational aspects of new products and feature sets
-
Assist with go/no go preplanning, verification/validation, and review of existing and new product/services
-
Proactively analyze data and test the integrity of network/systems to ensure production applications and services are operating optimally
-
Work within development teams to troubleshoot and resolve business affecting issues
-
Escalations, incident response, RCA, and blameless postmortem
-
Participate in on-call rotation