Senior AI Engineer
ashbyhq
Job Description
This is a senior technical role building the AI systems that power Credo AI's governance products. You will design and build intelligent systems that observe, reason about, and act on governance knowledge — from the agent infrastructure that monitors AI behavior in enterprise deployments, to the knowledge systems that make governance intelligence accessible and actionable at scale.
You are someone who cares about how AI systems behave, not just whether they perform. You think carefully about reliability, consistency, and failure modes — and you have the engineering discipline to turn that thinking into production systems. You work across the full stack of modern AI engineering: LLM-based agents, retrieval and knowledge systems, evaluation pipelines, and the behavioral configuration layers that shape how AI acts within defined constraints.
You will write code, ship systems, and own technical direction — while collaborating closely with AI researchers, governance experts, and product teams.
What You'll Build
AI agent systems
-
Design and implement agent architectures that reason about governance policies and take action within defined constraints
-
Build instrumentation that exposes how agents reason and act, making their behavior auditable at the session level
-
Design and build the telemetry, filtering, and analytics infrastructure that lets governance owners empirically verify how agents are behaving at organizational scale
-
Develop behavioral configuration and constraint systems that encode organizational policies into how AI agents operate
-
Architect evaluation frameworks that measure whether agent behavior actually aligns with governance intent
RAG - Governance knowledge systems
-
Develop retrieval and context systems that surface relevant governance knowledge at the right moment, to the right system or user
-
Design hybrid retrieval architectures combining semantic search, structured knowledge traversal, and dynamic context assembly
Platform infrastructure
-
Contribute to the broader AI systems architecture underlying Credo AI's platform
-
Work with data and product teams to translate governance intelligence into reliable, scalable product features
-
Establish engineering standards and best practices for AI system development across the team
About You
You have built LLM-based systems in production and you have encountered their failure modes firsthand — inconsistency, instruction-following breakdowns, unexpected behaviors at distribution edges. You think carefully about how to specify, evaluate, and constrain AI behavior, and you bring engineering rigor to problems that sit at the boundary of ML research and systems design.
You read the literature on agent architectures, evaluation methodology, and alignment-adjacent topics not because your job requires it but because you find them genuinely useful. You move fast, ship things, and iterate — but you think carefully about failure modes before they reach production.
Minimum Qualifications
-
5+ years building production AI/ML systems, with meaningful experience shipping LLM-based applications
-
Strong experience with agent architectures: tool use, planning, multi-step reasoning, and the failure modes that accompany them
-
Evaluation mindset — you have designed evals, run them, and used results to make systems meaningfully better
-
Experience with behavioral shaping techniques: prompt architecture, output validation, policy-grounded constraints
-
Solid systems engineering: you can design data pipelines, APIs, and distributed systems that hold up in production
-
Experience building monitoring or observability systems for AI applications in production
-
Experience with retrieval-augmented generation and tradeoffs in hybrid retrieval systems.
-
Strong communicator and collaborator
Preferred Qualifications
-
Research background or publications in LLM evaluation, alignment, agent safety, or AI robustness
-
Experience with red-teaming, adversarial evaluation, or automated failure detection
-
Background in multi-agent systems or systems where multiple AI components interact
-
Fam