Senior Site Reliability Engineer
globalpayments
Job Description
Key Responsibilities:
-
Maintain and troubleshoot production cloud infrastructure to ensure optimal performance and uptime.
-
Apply patches, updates, and security configurations across cloud environments.
-
Execute scheduled releases, ensuring successful deployment of application updates.
-
Manage and optimize CI/CD pipelines using Jenkins, implementing best practices for automated testing and deployment.
-
Monitor system performance and provide incident response to resolve production issues promptly.
-
Collaborate with development and operations teams to implement scalable and secure cloud solutions.
-
Develop and maintain technical documentation for cloud architecture, processes, and configurations.
-
Implement monitoring and logging to proactively identify and resolve system issues.
-
Assist in disaster recovery planning and execution to ensure data integrity and business continuity.
Qualifications:
-
Bachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent work experience).
-
2+ years of experience in cloud engineering or system administration.
-
Strong proficiency with AWS services, including EC2, S3, RDS, Lambda, and VPC.
-
Knowledge of logging and monitoring frameworks (e.g., ELK stack, Splunk).
-
Hands-on experience with Jenkins for CI/CD pipeline management.
-
Knowledge of infrastructure-as-code (IaC) tools such as Terraform or CloudFormation.
-
Strong scripting skills in Bash, Python, or similar languages.
-
Experience with monitoring tools such as CloudWatch, Datadog, or Prometheus.
-
Excellent problem-solving and troubleshooting skills.
-
Ability to work effectively in a fast-paced, collaborative environment.
-
Written and spoken proficiency in English
Preferred Skills:
-
Experience with containerization technologies (e.g., Docker, Kubernetes).
-
Familiarity with security best practices for cloud deployments.
-
Understanding of Agile and DevOps methodologies.