Senior AI QA Engineer
greenhouse
Job Description
Key Responsibilities:
- Architect Multi-Layered Validation Frameworks: Design and implement structured testing strategies that combine deterministic checks, semantic similarity metrics, and model-based evaluations.
- Automated Model Grading: Develop systems to evaluate clinical pipeline outputs for faithfulness, safety, and hallucination detection using various automated scoring techniques (e.g., BERTScore, ROUGE, or custom heuristics).
- Vibe-Driven Development: Leverage agentic AI tools to rapidly prototype complex test harnesses, "red-team" clinical logic, and build internal validation utilities at high velocity.
- Data Pipeline Integrity: Execute integration and regression tests for data-heavy backend processes, ensuring medical data remains consistent from ingestion to insight generation.
- Collaborative Strategy: Work closely with Data Scientists and Product Managers to define "Ground Truth" datasets and clinical evaluation rubrics.
- Root Cause Analysis: Deep-dive into complex system failures to identify whether issues stem from code logic, data drift, or model behavior.
Requirements:
- 6+ years of technical experience in Quality Assurance, with a strong focus on system architecture and backend data validation.
- Advanced Python Proficiency: Expert-level skills in Python for building custom test scripts and working within AI/ML ecosystems.
- AI/ML Validation Experience: Proven experience testing model outputs using diverse metrics (e.g., Semantic Similarity, NLP metrics, and custom scoring algorithms). Familiarity with tools like LangSmith, Arize Phoenix, or MLflow.
- Vibe Coding Mastery: Proficiency with AI-native development tools such as Cursor, Windsurf, Claude Code.
- Modern Automation Stack: Proficiency with frameworks like PyTest, and experience with traditional tools (Cypress, Playwright, or Selenium) for end-to-end flows.
- API & Data Testing: Solid understanding of API testing (Postman/RestAssured) and database validation (SQL/NoSQL).
- Strategic Thinking: Experience serving as a Subject Matter Expert (SME) on test process development for large-scale systems.
Preferred (Good to Have):
- Clinical Domain Knowledge: Familiarity with healthcare data standards such as FHIR, HL7, or SNOMED CT.
- Healthcare Compliance: Understanding of HIPAA regulations and the nuances of handling ePHI/PHI.
- Statistical Foundation: Understanding of sensitivity, specificity, and F1-scores in a clinical context.