Site Reliability Engineer
vestas
Job Description
Responsibilities
- Data Infrastructure & Engineering: Design and develop scalable solutions for storing and retrieving high-volume time-series data. Build robust ETL/ELT pipelines using AWS and Azure big data tools (e.g., Spark, Kafka, Kinesis, Azure Data Factory).
- CI/CD & MLOps: Develop and manage CI/CD pipelines for ML models and data infrastructure using MLflow, Jenkins, GitHub Actions, and Terraform. Enable reproducible, secure, and scalable deployments.
- Monitoring & Automation: Implement end-to-end monitoring for infrastructure, data pipelines, and ML services using Prometheus, Grafana, and custom alerting tools. Automate workflows to ensure high availability and zero-downtime deployments.
- Release & Incident Management: Manage releases across staging and production environments. Respond to Severity 1 incidents, lead root cause analysis (RCA), and implement permanent fixes.
- Collaboration & Process Improvement: Work with product teams to deploy releases, resolve customer escalations, and drive automation to improve onboarding and integration timelines.
- Analytics Enablement: Build tools to deliver insights into customer acquisition, operational efficiency, and business KPIs.
Qualifications
- Education & Certification: Bachelor’s or Master’s in Engineering with 4+ years in SRE or related roles. AWS Solution Architect Associate or Azure certification preferred. Familiarity with DevOps tools like Jenkins, Terraform, Ansible.
- Technical Skills: Strong experience with cloud platforms (AWS, Azure), big data technologies (Spark, Kafka, Databricks), and data storage (S3, Blob Storage). Proficient in SQL, Python, Java/Scala, and container orchestration (Kubernetes preferred).
- Soft Skills: Excellent communication, documentation, and time management. Strong problem-solving and customer-handling capabilities.
Competencies
- Problem Solving & RCA
- Technical Expertise in Cloud & Data Engineering
- Cross-functional Collaboration
- Adaptability to evolving tech
- Customer Focus & Escalation Management
- Innovation & Automation
- Analytical Thinking & Process Optimization