Engineer - DevOps/Site Reliability Engineering (SRE)
qualcomm
Job Description
Responsibilities
-
Serve as an advocate for quality practices including the development of automated testing to improve business processes
-
Act as a critical part of a multi-team effort to deliver, manage and maintain configuration automation to meet business needs.
-
Create and maintain configuration standards for software and infrastructure.
-
Manage CI & CD tools and pipelines as a partner to development and QA teams.
-
Develop and socialize operational standards for teams throughout engineering.
-
Recommend, develop and implement system enhancements that will improve the performance and reliability of the system including installing, upgrading/patching, monitoring, problem resolution, configuration management and security.
-
Oversight of critical incident and major system escalations from initiation to resolution.
-
Create mechanisms/architectures that enable fault tolerance and rapid recovery from failure.
-
Participate in a rotating on-call escalation service.
-
Create and maintain configuration standards for software and infrastructure.
-
Capacity Planning and Chaos Engineering.
-
Strong communication skills, verbal and written.
Qualifications
-
Bachelor’s degree in a technical field, or equivalent experience
-
1+ years’ experience in an operational environment, preferred
Technical Requirements
-
Experience with Linux Operating Systems in a production and development environments
-
Experience in network and server engineering
-
Experience with automation/configuration management such as Ansible, Chef, Puppet or equivalent
-
Experience with workflow data pipeline management services such as Airflow and/or Luigi
-
Expertise on the latest Cloud compute, load balancing and scaling, storage, networking, security, and virtualization technologies with Cloud providers such as GCP (preferred), AWS and/or Azure.
-
Demonstrated experience installing, operating, and troubleshooting a variety of open- source technologies
-
Experience with relational and non-relational databases
-
Practical experience developing software or meeting operational needs with code and scripting (Bash, Python, Perl, Ruby, and/or Java)
-
Experience with software quality principles and associated tools for testing and analysis.
-
Knowledge of CI & CD practices and supporting tools (Jenkins, Bamboo, or similar)
-
Experience with IaC Technologies such as Terraform, CloudFormation or Pulumi
-
Experience with PaaS technologies such as containers, container orchestration and scheduling, service registration / discovery and monitoring (Docker, Kubernetes, etc.)
-
Load, scalability, systems, or performance testing experience
-
Observability & Monitoring expertise to dissect data to get to the root cause of system and infrastructure issues.