Show HN: Self-Evolving Skill – 5라운드 실험의 실증적 결과

hackernews | | 💼 비즈니스
#claude #claude code #design pattern #self-evolving #skill #tip
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

MySQL 데이터베이스를 대상으로 5라운드에 걸친 자가 진화 AI 실험을 진행한 결과, 지식 변화가 없는 상호작용을 걸러내는 오차율이 63.6%를 기록하고 최종 정확도는 100%에 달했습니다. 이 과정에서 점진적인 수렴과 자기 수정을 통해 2개의 오류 규칙을 수정했으며, 도구 사용 시 발생하는 문제점을 발견하는 등 부가적인 성과도 거두었습니다. 현재 더 큰 규모의 통신 요금 데이터베이스를 통해 결과의 교차 검증을 진행 중입니다.

본문

English | 中文 A design pattern for Claude Code Skills that improve through use — growing more accurate and efficient over time, without bloating. Note Academic positioning: This pattern corresponds to Inter-test-time Context Evolution with Text-Feedback Governance in the self-evolving agent literature. See Gao et al. (2026) "A Survey of Self-Evolving Agents." Traditional Skills are static — an author packages them once, users invoke them repeatedly, and knowledge never grows. But in domains like database investigation, codebase analysis, and business system integration, an AI continuously discovers valuable domain knowledge during use — table relationships, query patterns, business rules, data characteristics. Without a way to persist this knowledge, every new session starts from zero, wasting both effort and context window. Is this pattern right for your use case? Ask two questions: - Will domain knowledge grow through use? - Does that growth have a natural ceiling? If both answers are yes, this pattern fits. skill-name/ ├── SKILL.md # Trigger conditions + governance protocol ├── scripts/ # Execution tools │ ├── core/ # Computation layer (decay model) │ │ ├── formulas.py # Atomic formulas │ │ ├── models.py # Composite models + config │ │ └── parser.py # Decay tag parser │ ├── decay_engine.py # CLI: init / scan / feedback / reset / inject / search │ └── *.py # Domain-specific tools └── references/ # Living knowledge base (AI-maintained) ├── _index.md # Routing table (.md # Topic files with decay-tagged entries The reference implementation is a Self-Evolving Skill for MySQL database investigation. Install it to see the pattern in action on your own database. - An AI coding agent (Claude Code, Cursor, Windsurf, Codex, etc.) - Node.js ≥ 18 and Python ≥ 3.8 pip install pymysql macOS / Linux: npx skills add 191341025/Self-Evolving-Skill --skill db-investigator Windows: npx skills add 191341025/Self-Evolving-Skill --skill db-investigator --copy -y --copy bypasses Windows symlink permission issues;-y skips interactive agent selection. Run setup.py from the installed skill directory: # Find your agent's skill path (one of these will exist): # .claude/skills/ .cursor/skills/ .windsurf/skills/ .continue/skills/ python /skills/db-investigator/scripts/setup.py The interactive wizard collects your MySQL connection details, tests the connection, and initializes the knowledge system. Tip: Or just start a conversation and ask a database question — if unconfigured, the skill will tell you exactly what to run. Start a Claude Code conversation and ask any database question. The skill activates automatically and begins evolving its domain knowledge through use. This is the core of the pattern. It prevents the knowledge base from degrading into noise. Gate 1 — VALUE Q: Can this knowledge be reused across sessions? → One-time result (e.g., "query returned 42 rows at 3pm") → REJECT → Reusable pattern or stable fact → PASS Gate 2 — ALIGNMENT Q: Does this contradict existing knowledge? → Contradiction found → CORRECT the existing entry (don't append) → Consistent → PASS Gate 3 — REDUNDANCY Q: Does this already exist, possibly worded differently? → Exists → MERGE into existing entry, or skip → Doesn't exist → PASS Gate 4 — FRESHNESS (write) Classify knowledge type and attach decay metadata: → confirmed= C0=1.0 --> → Six types: schema | business_rule | tool_experience | query_pattern | data_range | data_snapshot → High-decay types (data_range, data_snapshot): prefer rejection Gate 4 — FRESHNESS (read) Run confidence scan before using knowledge: → Tool computes C(t) based on time elapsed and feedback history → TRUST (C>=0.8): use directly → VERIFY (0.5<=C<0.8): use but flag for verification → REVALIDATE (C<0.5): verify with tools first Gate 4 — FRESHNESS (feedback) After operations that used knowledge: → Success → record positive feedback (slows future decay) → Failure → record negative feedback (accelerates decay) → After revalidation passes → reset to fresh state Gate 5 — PLACEMENT Q: Which file does this belong in? Which memory layer? → Existing topic → Add to that file → New topic → Only create a new file if 3+ related entries exist; update _index.md The most common outcome of the Five Gates is: do nothing. Most interactions don't produce knowledge worth storing. The protocol's primary job is to reject, not to accept. | Capability | Mechanism | |---|---| | Add knowledge | Must pass all five gates | | Correct errors | Gate 2 detects contradictions; fix in place | | Deduplicate | Gate 3 merges rather than appends | | Expire stale data | Gate 4 confidence decay model; tool-computed freshness with Bayesian feedback | | Maintain structure | Gate 5 + scaling rules control file granularity | | Level | Loaded when | Content | Change frequency | |---|---|---|---| | Level 1: frontmatter | Always, in system prompt | "When to use this Skill, how to behave" | Rarely changes | | Level 2: body | When Claude judges the task is relevant | "Which to

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →