SRE- Systems
thalesgroup
Job Description
Responsibilities
- Apply SRE core tenets of measurement (SLI/SLO/SLA), eliminate toil, and reliability modeling
- Enable and educate development teams on industry best practice design patterns, ways of working and operational knowledge to ensure platform continuity
- Develop and architect solutions to infrastructure and operational aspects of new products and feature sets
- Assist with go/no go preplanning, verification/validation, and review of existing and new product/services
- Proactively analyze data and test the integrity of network/systems to ensure production applications and services are operating optimally
- Work within development teams to troubleshoot and resolve business affecting issues
- Escalations, incident response, RCA, and blameless postmortem
- Participate in on-call rotation
Qualifications
- At least 3 years of professional experience within a cloud/web/CDN scale infrastructure
- Experience with Python and Go. C/C++ a plus
- Expert knowledge of Linux systems, network programming and protocols TCP, UDP, DNS, TLS/SSL, HTTP
- Experience with BGP and Anycast routing is a plus
- Experience with DevOps principles and concepts such as Infrastructure as Code (Ansible/Saltstack), CI/CD (Gitlab, Jenkins, Git), monitoring and visualization (Prometheus, Grafana)
- Experience with big data technologies such as NoSQL/RDBMS, Redis, ElasticSearch, Kafka
- Experience with containers and container management (Docker, Kubernetes)
- Experience analyzing and building data telemetry, modeling, pipelines, UI visualization
- Experience in developing software, troubleshooting, and monitoring large scale distributed systems
- Implement software engineering best practices/standards and software development life cycle
- Working knowledge and experience of Agile software development methodologies
- A strong team player who is accountable towards business urgency
- Ability to stay organized in a multi-tasking environment
- Self-starter personality