Observability Engineer Lead
equifax
Job Description
Key Accountabilities
- Develop and maintain observability using AWS/GCP tools and Datadog.
- Keep monitoring tool software currency up to date across Cloud/Legacy landscape
- Keep Engineering updated with logging/tracing standards
- Good knowledge of Splunk or other logging tools like ELK stack
- Have good understanding of Application Performance Management
- Implement best practices for observability, including metrics, logging, and tracing.
- Collaborate with engineering and operations teams to troubleshoot and resolve performance issues.
- Automate observability processes and integrate them into CI/CD pipelines.
- Analyze and interpret monitoring data to provide actionable insights and recommendations.
- Stay updated with the latest advancements in GCP and Datadog to continuously improve our observability capabilities
- Good knowledge of linux/windows environment
- Work in Scaled Agile Framework
-
Solve problems and triage complex distributed architecture service maps. On call for high severity application incidents and improving run books to improve MTTR
-
Lead availability blameless postmortem and own the call to action to remediate recurrences
What experience you need
- BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job
- experience required
- 5+ years experience with monitoring tools Google/AWS Cloud Monitoring, Appdynamics, DataDog, Splunk , Elastic Search or similar
- 5+ years’ experience in system support, coding or operations.
- Hands-on experience with Windows/Linux environments
- Excellent problem-solving and communication skills
- Provide step-by-step technical help, both written and verbal
- Experience in languages such as Python, Bash, Java, Go JavaScript and/or node.js
- Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
-
System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansibleand/or containers (Docker, Kubernetes, etc.)
-
Proficiency with continuous integration and continuous delivery tooling and practices
-
Cloud Certification Strongly Preferred