Staff Specialist IT - Site Reliability Engineer
infineon
Job Description
In your new role you will:
- Design, develop, and maintain large-scale distributed systems, focusing on reliability, scalability, and performance.
- Collaborate with development teams to identify and prioritize system improvements, and develop solutions to meet business needs.
- Develop and implement automation tools and processes to improve efficiency, reduce downtime, and enhance system reliability.
- Monitor and troubleshoot system issues, identifying root causes and implementing fixes.
- Develop and maintain technical documentation, including system design documents, operational guides, and knowledge base articles.
- Participate in on-call rotations, providing 24/7 support for our systems and infrastructure.
- Collaborate with cross-functional teams to ensure seamless integration of systems and services.
- Stay up-to-date with industry trends, best practices, and emerging technologies, applying this knowledge to improve our systems and processes.
Your Profile
You are best equipped for this task if you have:
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 5+ years of experience in a similar role, with a focus on system Reliability, scalability, and performance.
- Strong programming skills in languages such as Python, Java, C++, or Go.
- Experience with cloud-based infrastructure, such as AWS, GCP, or Azure.
- In-depth knowledge of Linux/Unix operating systems, networking protocols, and distributed systems.
- Experience with automation tools such as Ansible, Puppet, or Chef.
- Strong understanding of monitoring and logging tools, such as Prometheus, Grafana, or ELK.
- Excellent problem-solving skills, with the ability to debug complex issues.
- Strong communication and collaboration skills, with the ability to work effectively with cross-functional teams.
- Experience with agile development methodologies and version control systems such as Git.
Nice to Have:
- Experience with containerization using Docker, Kubernetes, or similar technologies.
- Knowledge of security best practices and compliance frameworks such as HIPAA or PCI-DSS.
- Experience with machine learning or artificial intelligence applications.
- Certification in a relevant field, such as AWS Certified DevOps Engineer or Google Cloud Certified - Professional Cloud Developer.