결정이 확정되기 전에 AI 에이전트 추론을 감사하는 MCP 서버

hackernews | | 🔬 연구
#ai 에이전트 #claude #mcp 서버 #review #리뷰 #의사결정 #추론
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

안드류 에스피라가 개발한 'SENTINEL'은 고위험 결정을 내리는 AI 에이전트의 추론 과정을 실시간으로 감사하고 거버넌스하는 MCP 서버로, Solo.io의 agentgateway 배후에서 작동합니다. 이 도구는 의료 보험 사례에서처럼 AI가 정책 변경을 인지 못해도 89%의 높은 신뢰도를 보이는 등 신뢰와 실제 정확도 간의 괴리를 탐지하여 오류를 방지합니다. 또한, agentgateway의 RBAC 및 감사 로그 기능과 연동되어 도구별 접근 제어 및 보안을 강화하는 특징을 가지고 있습니다. SENTINEL은 신호의 최신성, 완결성, 가중치 등을 검증하는 4단계 파이프라인을 통해 에이전트의 결정이 실행되기 전에 구조적 건전성을 보장합니다.

본문

When an AI agent makes a high-stakes decision â approving a claim, escalating an incident, authorising a procedure â knowing what it decided isn't enough. You need to know whether the reasoning behind that decision was structurally sound before it commits. That's what SENTINEL does. It sits behind agentgateway as an MCP server and audits agent reasoning quality in real time â checking whether evidence was complete, current, and properly weighted before a decision goes through. I built SENTINEL for the AI Agent & MCP Hackathon (Secure & Govern MCP track). It demonstrates how agentgateway's MCP governance primitives â RBAC, session management, audit logging â can be combined with domain-specific reasoning audits to create a practical governance layer for autonomous agents. Consider a healthcare AI agent â MedAgent â processing prior authorisation decisions. It evaluates whether an insurance payer will approve or deny a requested procedure. On Aetna claims, MedAgent reports 89% confidence. That number looks fine in any dashboard. But SENTINEL has been tracking actual outcomes for 60 days. The reality: Without SENTINEL, every one of those decisions auto-executes. Patients receive incorrect denials. Appeals pile up. Revenue leaks. Nobody notices until a quarterly audit â weeks or months later. SENTINEL Observatory Dashboard â The reliability heatmap shows Aetna's accuracy declining week over week while UnitedHealthcare remains healthy. SENTINEL is built in Go and runs as an MCP server (Streamable HTTP transport) behind Solo.io's agentgateway. The architecture: MCP Clients (Claude, GPT, Custom Agents) â â¼ âââââââââââââââââââââââââââââââââââ â agentgateway (:3000) â â ⢠CEL-based RBAC per MCP tool â â ⢠Session management â â ⢠Audit logging (all calls) â â ⢠CORS for playground access â ââââââââââââââ¬âââââââââââââââââââââ â Streamable HTTP â¼ âââââââââââââââââââââââââââââââââââ â SENTINEL MCP Server (:8081) â â 4 MCP Tools â ââââââââââââââ¬âââââââââââââââââââââ â ââââââââââ¼âââââââââ¬ââââââââââââ â¼ â¼ â¼ â¼ Fidelity Pattern Reliability Authority Auditor Library Scorer Gate agentgateway adds the security and governance layer that SENTINEL itself doesn't need to implement. CEL policies control who can call which tool: mcpAuthorization: rules: # Public â anyone can read failure patterns - 'mcp.tool.name == "sentinel_patterns"' - 'mcp.tool.name == "sentinel_reliability"' # Authenticated â JWT required to run evaluations - 'mcp.tool.name == "sentinel_evaluate" && has(jwt.sub)' # Privileged â operator role to pull from Datadog - 'mcp.tool.name == "sentinel_pull_decisions" && has(jwt.sub) && "operator" in jwt.roles' This means read-only tools (patterns, reliability) are accessible to any MCP client. Running the evaluation pipeline requires authentication. Pulling raw decision events from Datadog requires operator privileges. All invocations are audit-logged through agentgateway. agentgateway UI â SENTINEL's MCP tools visible in the gateway with tool-level RBAC enforcement via CEL policies. When an agent decision arrives at sentinel_evaluate , it passes through four stages: SENTINEL inspects every piece of evidence the agent retrieved. For each signal (payer policy, patient history, step therapy docs, clinical criteria), it checks: STALE_POLICY flag (CRITICAL)INCOMPLETE_RETRIEVAL flagTIMEOUT_ON_CRITICAL flagWEIGHT_DIVERGENCE flagMISSING_SIGNAL flagCONFIDENCE_MISMATCH flagThe output is a fidelity score (0.0â1.0) with per-signal audit details and suggested fixes. The fidelity flags are matched against a library of learned failure signatures. Each pattern carries historical accuracy data: Pattern accuracy updates via exponential moving average as new outcomes resolve. SENTINEL Pattern Library â Each failure signature carries historical accuracy data. Pattern Delta (Stale + Incomplete) has only 23% accuracy. SENTINEL maintains rolling accuracy profiles per agent à payer combination. It computes: Confidence vs Reality Gap â The agent reports ~89% confidence but actual accuracy is ~72%. The Aetna drift chart shows the 8-week decline from 67% to 0%. The gate combines fidelity, pattern, and reliability data into a verdict: Every verdict is emitted to Datadog as a custom event with full context: decision ID, authority level, fidelity score, pattern detected, reliability score. SENTINEL also creates Datadog monitors that alert when per-payer accuracy crosses threshold â the drift detection feedback loop. Each decision is logged to Braintrust as an eval span. When outcomes resolve (days later), the ground truth is attached. This gives a persistent eval dataset: how often SENTINEL's verdicts were correct, which patterns were misclassified, and calibration quality over time. When the authority gate issues HUMAN_REQUIRED or QUARANTINE, SENTINEL automatically creates a Cleric incident with the full decision context: the original agent reasoning, the fidelity audit, the pattern match, the reliability profile. A human billing spec

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →