Show HN: AI 에이전트가 자신의 작업을 자체 승인하는 것을 포착하여 이를 구축했습니다.

hackernews | | 📦 오픈소스
#ai 거버넌스 #ai 에이전트 #review #오픈소스 #파이썬 #품질 관리
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

실무 환경 5,000건 이상의 실행 데이터를 기반으로 개발된 ‘Governor’는, AI 에이전트가 검토 없이 자가 승인하거나 롤백 계획 없이 배포하는 등의 오작동을 방지하기 위해 고안된 오픈소스 거버넌스 도구입니다. 이 도구는 업무 제출 전 자체 리뷰(Eg-01), 증거 기반 확인(Eg-07), 배정된 산출물 확인 등의 가드(Guard)를 강제하여, 검증되지 않은 결과물이 프로덕션에 반영되는 것을 원천적으로 차단합니다. 또한 상태 전이 엔진을 통해 검토자와 실행자의 역할을 분리하고 모든 과정을 감사 추적(Audit trail)함으로써, 외부 의존성 없이 AI 에이전트의 일관된 품질과 보안을 유지할 수 있습니다.

본문

Quality gates for AI agents. Guards that don't get tired. pip install ai-governor from governor.backend.memory_backend import MemoryBackend from governor.engine.transition_engine import TransitionEngine import governor.guards.executor_guards # noqa: F401 backend = MemoryBackend() engine = TransitionEngine(backend=backend) # Try to submit without a self-review — Governor blocks it result = engine.transition_task("TASK_001", "READY_FOR_REVIEW", "DEVELOPER") # FAIL — EG-01: No self-review found. Fix: Create a self-review before submission. Your agents produce output. But who checks it before it hits production? Extracted from a system with 5,000+ governed task executions. See docs/PROOF.md for anonymized evidence. | Without Governance | With Governor | | |---|---|---| | Agent submits work | Goes straight to production | Hits guard evaluation first | | Missing self-review | Nobody notices | EG-01 blocks transition | | Deploy without rollback plan | Hope for the best | EG-06 blocks until rollback documented | | Evidence claims | Trust the agent | EG-07 requires multi-source verification | | Audit trail | What audit trail? | Transition outcomes returned consistently; persistable via backend + graph model | | Quality over time | Degrades silently | Scoring rubric enforces consistent standards | These aren't hypothetical. We've seen each one in production. The Silent Deploy — An agent marked a deployment task as complete. No rollback plan. Production went down two hours later. Governor's EG-06 blocks any DEPLOY task that doesn't mention a rollback strategy. The task stays in ACTIVE until the agent adds one. The Missing Evidence — An investigation came back "done" with a two-sentence summary. No sources, no linked reports. Plausible but unverifiable. EG-02 requires a linked report for INVESTIGATION and AUDIT tasks. EG-07 checks for multi-source evidence. Thin output gets blocked. The Self-Approver — An agent submitted work and approved it in the same step. No second pair of eyes ever evaluated the output. Governor's state machine enforces role separation. EXECUTOR submits. REVIEWER approves. One role cannot do both. The Quality Slide — Early on, every agent output was manually reviewed. Volume increased. Reviews got faster and shallower. Quality degraded — nobody noticed until a customer did. The scoring rubric provides a consistent quality signal on every task. Guards don't get tired. See docs/WHY.md for the full analysis. Governor enforces a loop: - Agent submits work — transitions task to READY_FOR_REVIEW - Guards evaluate — pluggable validation functions check preconditions - PASS or FAIL — reviewer approves (task completes) or rejects (task reworks) - Audit trail — every transition records structured guard outcomes for persistence and analytics ACTIVE ──> READY_FOR_REVIEW ──> COMPLETED │ ^ │ │ v │ REWORK Every transition requires authorization (role-based) and validation (guard-based). No task moves forward without passing its guards. Transition writes use optimistic concurrency (expected_current_status ) to prevent lost updates under concurrent callers. If task state changes between read and write, the transition fails with STATE_CONFLICT and no state mutation is applied. Governor records every transition attempt as a TransitionEvent with attached GuardEvaluation records. This enables: - full per-task audit trails - guard failure hotspot analysis - policy coverage metrics - rework lineage analysis pip install ai-governor from governor.backend.memory_backend import MemoryBackend from governor.engine.transition_engine import TransitionEngine import governor.guards.executor_guards # noqa: F401 backend = MemoryBackend() engine = TransitionEngine(backend=backend) Zero external dependencies. In-memory backend. Ready in 4 lines. Or install from source: git clone https://github.com/june-jule/ai-governor.git cd governor pip install -e ".[dev]" Terminal output from python examples/full_task_lifecycle.py ============================================================ Governor — Full Task Lifecycle Demo ============================================================ [1] Created task: TASK_DEMO_001 (status=ACTIVE) [2] Available transitions for EXECUTOR from ACTIVE: -> READY_FOR_REVIEW NOT READY (2 guards unmet) EG-01: No SELF_REVIEW found EG-03: Missing deliverables: auth.py, auth_test.py [3] Dry-run submission: FAIL (state unchanged) [4] Submit without self-review (expect failure): Result: FAIL FAIL EG-01: No SELF_REVIEW found Fix: Create a self-review before submission FAIL EG-03: Missing deliverables: auth.py, auth_test.py Fix: Ensure all stated deliverables exist on filesystem [5] Added report + self-review (EG-01 and EG-03 now satisfied) Submit for review: PASS [6] Reviewer approves: PASS Task COMPLETED! ============================================================ Final status: COMPLETED ============================================================ import governor.guards.executor_guards # noqa: F401 # Create a task (starts i

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →