AWS DevOps Python Engineer
persistent
Job Description
- Design, develop, and maintain monitoring and alerting solutions using Python and AWS services to ensure high availability and reliability.
- Build event‑driven monitoring workflows using AWS Lambda for real‑time processing of logs, metrics, and events.
- Configure and manage Amazon CloudWatch: Metrics, Logs, Dashboards, Alarms, Anomaly detection
- Implement automated alerts and notifications using Amazon SNS, integrating with: Email, Messaging platforms, Incident management tools
- Use Amazon EventBridge to capture system events and trigger alerting or remediation workflows.
- Develop custom health checks and monitoring scripts in Python for proactive issue detection.
- Monitor application performance including: Error rates, Latency, Resource utilization, Overall system health across AWS environments
- Support incident response by providing: Actionable alerts, Root cause analysis data, Continuous improvement of alert quality (reducing false positives)
- Collaborate with DevOps, SRE, and application teams to align monitoring standards, SLAs, and best practices.