Site Reliability Engineer
Peoplefy
Job Description
Primary Skills (Must Haves):
- Soft Skills:
- Excellent communication skills, both written and verbal.
- Strong problem-solving and troubleshooting capabilities.
- Technical Skills:
- Experience in Application Support in Windows or Linux environments.
- Proficiency in Application Performance Monitoring (APM) tools like Dynatrace (preferred), New Relic, Datadog, or AppDynamics.
- Experience with Splunk or ELK Stack (Elasticsearch, Logstash, Kibana), including the ability to write queries in Splunk.
- Experience with containerization, including working with Docker and Kubernetes.
- Strong troubleshooting skills, with a focus on end-to-end application support.
- Prioritize candidates from product-based companies with hands-on experience in APM tools and application monitoring.
Key Responsibilities:
- Provide end-to-end application support in Windows and Linux environments.
- Use APM tools like Dynatrace and similar platforms for performance monitoring, identifying, and resolving issues.
- Write Splunk queries for effective log analysis and troubleshooting.
- Work with containerized environments (Docker, Kubernetes), including tasks like pod creation and management.
- Diagnose and resolve complex issues related to application performance, functionality, and system health.
- Collaborate with cross-functional teams to ensure the optimal performance of applications and infrastructure.
- Review and improve web architecture understanding for better application support and performance.
Education:
- Any full-time graduate degree.