Senior Engineer, Observability, Automation & Reliability - Splunk
wbd
Job Description
ARCHITECTURE / ENGINEERING / STRATEGY
-
Design, roadmap, and administer tools used in discovering and monitoring WBD’s applications, services, platforms, and infrastructure.
-
Build monitoring systems that assist in infrastructure and application event detection and alert remediation.
-
Collaborate with a cross-function team of developers, administrators, engineers, and architects to ensure that tools and processes are keeping pace with observability requirements.
-
Participate in strategy and future implementation discussions for the redesign and implementation of monitoring environments to modernize with latest technology trends.
OPERATIONS
-
Ensure all relevant infrastructure and services are properly covered within our monitoring and alerting systems in a manner consistent with our standards; collect the right metrics at the right frequency, and ensure the data is readily available for effective alerting, reporting, and analysis.
-
Execute planned and ad hoc configuration activities.
-
Create and maintain documentation for monitoring requirements, processes, and implementation.
-
Assist in the deployment, organization, and management of standard operating procedures.
ANALYTICS
-
Define business and operational success metrics, process models for benchmarking, standardization, and process improvements.
-
Report on agreed upon metrics and measure of success
Qualifications & Experiences:
-
Degree in Computer Science, Information Technology, or related technical field, or equivalent practical experience
-
5+ years of experience in system engineering and/or administration in an enterprise environment
-
Experience installing, configuring, and maintaining Splunk, including Splunk Enterprise and Splunk Cloud
-
Experience managing a Splunk environment including forwarders, heavy forwarders, deployment servers, data ingestion, apps, indexes, clusters, and search queries.
-
Experience with large-scale distributes systems and architecture knowledge (Linux/Unix and Windows operating systems, networking, storage) in a cloud computing or traditional IT infrastructure environment
-
Not Required but preferred experience:
-
Experience with scripting and automation using one or more of the following: Python, PowerShell, Bash.
-
Experience with configuration management and Infrastructure as Code tools such as Terraform, Cloudformation, Ansible, Puppet, or Chef