Senior Engineer, Cloud Infrastructure
cvent
Job Description
Cloud Infrastructure Engineering (AWS)
- Design, implement, and operate highly available, secure, and scalable AWS infrastructure (e.g., VPC, Transit Gateway, EC2, Load Balancing, S3, EBS/EFS/FSx, Route 53, IAM, KMS, Backup).
- Build and maintain infrastructure-as-code using tools such as AWS CDK / CloudFormation, enforcing standards, guardrails, and reusable patterns.
- Develop automation and tooling (primarily in Python/TypeScript) to remove repetitive operational work (provisioning, patching, configuration, cleanup, compliance checks, reporting).
- Contribute to and sometimes lead design reviews, architecture discussions, and RFCs for new or evolving infrastructure services.
- Partner with Security and Compliance to meet security, audit, and regulatory requirements across accounts and regions.
AI Agents, Orchestration & Multi‑Agent Systems
- Identify high‑value Cloud Infra workflows (e.g., incident triage, change impact analysis, runbook execution, capacity/cost recommendations) that can be automated using AI agents.
- Design and implement agentic workflows (single and multi‑agent) using modern AI orchestration patterns and frameworks (e.g., tool‑calling, planners, evaluators, guardrails).
- Integrate agents with existing cloud APIs, observability tools, ticketing systems, and runbooks to provide end‑to‑end, human-in-the-loop automation.
- Define and enforce safety, security, and approval guardrails for AI‑driven actions (RBAC, policy checks, dry‑runs, explicit approvals, audit logging).
- Measure and communicate impact of AI automation (MTTR reduction, hours saved, error reduction, cost optimization, improved engineer experience).
Reliability, Operations & On‑Call
- Own the reliability and performance of services you build – from design through deployment and production operations.
- Implement and tune monitoring, logging, alerting, and SLO/SLA dashboards for Cloud Infra services (Datadog/Splunk/CloudWatch or similar).
- Participate in the on‑call rotation, lead troubleshooting for complex AWS infrastructure incidents, and drive post‑incident reviews and preventative improvements.
- Proactively identify technical debt and reliability risks in infrastructure and drive remediation plans.
Collaboration, Mentoring & Best Practices
- Act as a technical mentor to Engineer I/II teammates on AWS fundamentals, automation patterns, and AI‑driven operations.
- Help define and evolve paved road standards for AWS infrastructure, automation, and AI agent usage across Cloud Infrastructure.
- Contribute to runbooks, design docs, knowledge base articles, and internal training sessions, including AI and automation best practices
Here's What You Need:
Required Qualifications
- 3–6 years of hands-on‑ experience in Cloud / Infrastructure Engineering, or similar roles, with strong focus on AWS.
- Deep understanding of core AWS services: VPC & networking (subnets, routing, TGW, VPN/Direct Connect, security groups, NACLs), EC2, Auto Scaling, Load Balancing, S3, EBS/EFS/FSx, Route 53, IAM, KMS, CloudWatch/CloudTrail, and Backup.
- Strong experience with Infrastructure-as-Code (AWS CDK, CloudFormation, or Terraform) and Git-based workflows (branching, PR reviews, CI/CD).
- Solid programming skills in at least one language commonly used for infra-automation, such as Python or TypeScript/Node.js.
- Proven track record designing and operating production-grade, multi‑account/multi‑region AWS environments with a focus on security, reliability, and cost.
- Experience implementing observability for infrastructure services (metrics, logs, traces, alerting, dashboards).
- Demonstrated ability to own complex projects end-to-end: requirements, design, implementation, rollout, and post‑launc