Senior Engineer (Site Reliability Engineer)
bceglobaltech
Job Description
Key Responsibilities
- "Ensure the 24/7 operations and reliability of data services in our production GCP and on-premise Hadoop environments.
- Collaborate with the data engineering development team to design, build, and maintain scalable, reliable, and secure data pipelines and systems.
- Develop and implement monitoring, alerting, and incident response strategies to proactively identify and resolve issues before they impact production.
- Drive the implementation of security and reliability best practices across the software development life cycle.
- Contribute to the development of tools and automation to streamline the management and operation of data services.
- Participate in on-call rotation and respond to incidents in a timely and effective manner.
- Continuously evaluate and improve the reliability, scalability, and performance of data services".
Technology skills
- 4+ years of experience in site reliability engineering or a similar role.
- Strong experience with Google Cloud Platform (GCP) services, including BigQuery, Dataflow, Pub/Sub, and Cloud Storage.
- Experience with on-premise Hadoop environments and related technologies (HDFS, Hive, Spark, etc.).
- Proficiency in at least one programming language (Python, Scala, Java, Go, etc.).
Required qualifications to be successful in this role.
- Bachelor’s degree in computer science engineering, or related field.
- 8 -10 years of experience as a SRE.
- Proven experience as an SRE, DevOps engineer, or similar role.
- Strong problem-solving skills and ability to work under pressure.
- Excellent communication and collaboration skills.
- Flexible to work in EST time zones ( 9-5 EST)