Mycelium – 코드를 작성하기 전에 AI 에이전트가 문제를 검증하도록 합니다.

hackernews | 2026년 4월 11일 06:38 | 📦 오픈소스

#ai 에이전트 #claude #mycelium #review #문제 검증 #소프트웨어 개발 #제품 기획

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

마이셀리움(Mycelium)은 AI 코딩 에이전트가 코드를 작성하기 전에 문제를 먼저 검증하도록 유도하는 새로운 접근 방식을 제시합니다. 이 도구는 AI가 실제 코딩에 착수하기 전에 요구사항의 적합성과 논리적 모순을 사전에 파악하도록 도와줍니다. 결과적으로 불필요한 코드 수정을 줄이고 전반적인 소프트웨어 개발의 정확도와 효율성을 크게 향상시키는 데 기여합니다.

본문

Your AI agent's product thinking partner. v0.11.0 Mycelium ensures you build the right thing — not just build the thing right. It guides AI agents through structured discovery, strategy, and delivery using 42+ established product frameworks, 12 theory gates, and 38 skills. Drop it into any Claude Code project and run /interview . npx degit haabe/mycelium my-project && cd my-project # Start Claude Code, then: /interview Works for software, online courses, AI tools, and services. One command to start. The agent handles the rest. In its first dogfood session, Mycelium forced a strategic pivot before any code was written — the founder's original positioning was invalidated by evidence the framework required them to gather. Without it, that mistake would have shipped. AI coding agents are powerful but unguided. They'll jump from an idea to code without discovery, skip security, ignore accessibility, repeat past mistakes, and inflate their own confidence. Spec-driven tools (Kiro, Spec Kit) help structure the coding — but they start at "what to build," never "should you build it." Mycelium starts earlier and stays longer. Click "Use this template" on GitHub, or: npx degit haabe/mycelium my-project cd my-project Then start Claude Code and run /interview . The agent guides you through purpose, vision, users, strategy, and classifies your project to determine what matters. npx degit haabe/mycelium/CLAUDE.md ./CLAUDE.md npx degit haabe/mycelium/.claude ./.claude Then start Claude Code and run /interview . Mycelium is not a software library -- it's a set of instructions that reshape agent behavior on every session start. Upgrading replaces framework files (skills, engine rules, hooks) while preserving your project state (canvas data, diamond state, decisions, memory). Automated upgrade (recommended): bash scripts/upgrade.sh # upgrade to latest bash scripts/upgrade.sh v0.12.0 # upgrade to specific version The script reads .claude/manifest.yml to distinguish framework files from project state. It will never overwrite your populated canvas files, diamond state, decision log, or memory. It requires a clean git state (commit first) so you can revert if anything goes wrong. After upgrading: Run /diamond-assess to see your diamonds through the new version's lens. Gates may have become stricter or new fields may be expected -- this is intentional, and /diamond-assess will tell you what needs attention. Manual upgrade: See the upgrade checklist in .claude/manifest.yml for which files to replace vs preserve. Mycelium guides development through fractal diamonds -- recursive Discover/Define/Develop/Deliver cycles (based on the Double Diamond by the Design Council) that operate at every scale: L0: Purpose "Why do we exist?" (Sinek, Christensen) L1: Strategy "Where do we play?" (Wardley, North Star, Skelton) L2: Opportunity "What problem to solve?" (Torres, Allen, Cynefin) L3: Solution "How to solve it?" (Gilad, Cagan, Downe) L4: Delivery "Build and ship" (Forsgren, OWASP, SOLID) L5: Market "Reach users" (Lauchengco, Shotton) Each diamond has four phases: Discover (diverge) -> Define (converge) -> Develop (diverge) -> Deliver (converge). Diamonds spawn child diamonds when complexity requires it. Parent diamonds continue while children execute. If delivery reveals a bad assumption, the diamond regresses back with new evidence. This creates smooth flow from idea to delivery without artificial phase breaks. Diamond lifecycle: diamonds can be active, blocked, archived, or killed. Stale diamonds (30+ days without progress) are flagged for review. Every diamond transition must pass theory gates -- not just a confidence score, but evidence checks grounded in specific frameworks: | Gate | What It Checks | Source | Suggested Skill | |---|---|---|---| | Evidence | Research-backed? Multiple sources? | Torres, Gilad | /user-interview , /assumption-test | | Four Risks | Value, usability, feasibility, viability assessed? | Cagan | /assumption-test | | JTBD | Emotional and social dimensions mapped? | Christensen | /jtbd-map | | Cynefin | Domain classified? Method appropriate? | Snowden | /cynefin-classify | | Bias Check | Research designed to mitigate cognitive biases? | Shotton, Kahneman | /bias-check | | Security | Threat model updated? | OWASP (STRIDE) | /threat-model , /security-review | | Privacy | Privacy assessed? Data minimized? | Cavoukian (PbD), GDPR | /privacy-check | | BVSSH | Aligned with Better, Value, Sooner, Safer, Happier? | Smart | /bvssh-check | | Service Quality | Downe's 15 principles checked? | Downe | /service-check , /a11y-check | | DORA | Delivery metrics healthy? | Forsgren | /dora-check | | CALMS | DevOps culture healthy? | Humble | /bvssh-check | | Corrections | Past mistakes reviewed? | Mycelium self-learning | /preflight , /reflexion | If ANY gate fails, the agent reports which gates failed, cites the theory, suggests the skill to run, and recommends the smallest action to satisfy each -- but does NOT proceed. All product knowledge lives in .claude/canvas/*.yml -- 21 structured YAML files that serve as the single source of truth: | Canvas File | What It Captures | Source Theory | |---|---|---| purpose.yml | Why/How/What, ethical boundaries | Sinek | north-star.yml | North Star metric + input metrics | North Star Framework | bvssh-health.yml | Better/Value/Sooner/Safer/Happier | Smart | landscape.yml | Wardley Map components + evolution | Wardley | team-shape.yml | Team types, cognitive load, interactions | Skelton | opportunities.yml | Opportunity Solution Tree | Torres | user-needs.yml | User needs map (functional/emotional/social) | Allen | gist.yml | Goals, Ideas, Steps, Tasks | Gilad | services.yml | 15 Good Services principles assessment | Downe | go-to-market.yml | Positioning, launch tiers, GTM motion | Lauchengco | dora-metrics.yml | Four key delivery metrics | Forsgren | threat-model.yml | STRIDE threat model per component | OWASP | privacy-assessment.yml | Privacy by Design / GDPR assessment | Cavoukian | trust-signals.yml | Trust architecture, transparency | Digital Trust | jobs-to-be-done.yml | JTBD map (functional/emotional/social) | Christensen | bounded-contexts.yml | DDD bounded contexts and context map | Evans (DDD) | value-stream.yml | End-to-end flow, wait times, bottlenecks | Rother & Shook (VSM) | content-metrics.yml | Content delivery metrics (courses, publications, media) | v0.11.0 | ai-tool-metrics.yml | AI tool delivery metrics (prompts, models, agents) | v0.11.0 | service-metrics.yml | Service delivery metrics (consulting, coaching) | v0.11.0 | human-tasks.yml | Offline human task tracking (interviews, outreach) | v0.11.0 | Canvas files are committed to git. They ARE your product documentation. Not all canvas files are required for every project -- the /interview skill classifies your project type and product type, then tells you which canvas files to focus on. Guardrails (.claude/harness/guardrails.md ) -- 30 constraints across three enforcement tiers: - BLOCK (2): Mechanically prevented by hooks. Secrets in code (G-S1), stale corrections (G-P5). - REVIEW (13): /diamond-progress refuses to complete delivery until satisfied. Tests, a11y, usability (Nielsen), services quality (Downe), threat modeling, input validation, BVSSH, decision logging, error states, canvas updates, AI disclosure, regulatory awareness, privacy. - NUDGE (15): Nudged by hooks, not blocking. Engineering principles (DRY, KISS, YAGNI, XP values), bias checks, data minimization, secure defaults, theory citations, devil's advocate, BVSSH dimensions, sustainable pace. Anti-Patterns (.claude/harness/anti-patterns.md ) -- 32 known failure modes across discovery, confidence, security, delivery, market/GTM, and strategic systems thinking: - "Solution-first thinking", "Confidence inflation", "Security-later", "Dark pattern marketing", "Regression avoidance", and more Cognitive Biases (.claude/harness/cognitive-biases.md ) -- Per-stage bias checklist based on Shotton and Kahneman. 20+ biases mapped across L0-L5 including agent's own biases. Security & Trust (.claude/harness/security-trust.md ) -- Per-stage security requirements from OWASP, STRIDE, and Privacy by Design. Engineering Principles (.claude/harness/engineering-principles.md ) -- Explicit rules for DRY, KISS, YAGNI, SoC, SOLID, Law of Demeter, Clean Code. Mycelium enforces guardrails through hooks that fire automatically at different points: | Layer | Event | What It Does | Cost | |---|---|---|---| | 1. PreToolUse gate | Before code edits | Preflight check + secret detection (blocks hardcoded API keys, tokens) | ~30 tokens | | 2. PostToolUse nudge | After code edits | Context-aware reminders (a11y for UI files, OWASP for API files) | ~50 tokens | | 3. PostToolUseFailure | After failures | Reflexion analysis: diagnose root cause before retrying | ~200 tokens | | 4. Stop check | Session end | Canvas gap detection, overdue feedback loop warnings | ~50 tokens | | 5. SessionStart check | Session start/resume | Reminds about overdue strategic reviews (BVSSH, DORA) | ~50 tokens | | 6. Skill-level gates | On demand | Full 11-gate theory evaluation via /diamond-progress | varies | Total overhead: ~6,000 tokens/session (negligible vs typical 50K-200K sessions). Based on Gene Kim's Three Ways, Argyris's double-loop learning, and Meadows's leverage points: | Loop | Speed | Purpose | Key Mechanisms | |---|---|---|---| | 1. Immediate | Seconds | Fix the error (single-loop) | Reflexion, secret detection, corrections matching | | 2. Incremental | Hours/days | Improve the process (single-loop + memory) | Phase learnings, DORA metrics, retrospectives | | 3. Strategic | Weekly/monthly | Question the assumptions (double-loop) | BVSSH health, North Star trajectory, Wardley refresh | | 4. Transformative | Quarterly | Improve the system itself (triple-loop) | Eval benchmarks, prompt optimization | Run /feedback-review to check health across all loops. Includes regression warning triggers (e.g., "DORA declined twice in a row") and Goodhart's Law protection (counter-metrics for every tracked metric). The L5 -> L2 feedback loop: After launch, market signals feed back into new L2 Opportunity diamonds, closing the full cycle: Purpose -> Strategy -> Discovery -> Solution -> Delivery -> Market -> Discovery. - Corrections Memory -- Accumulated learning from mistakes. Read before every task. Pruned when > 30 entries. - Pattern Library -- Successful patterns to reuse across diamonds. - Reflexion Loop -- Implement, validate, self-critique, retry (max 3 iterations). - Eval Benchmarks -- 6 scenarios across discovery, delivery, and integration categories. - Prompt Optimization -- A/B testing of instruction changes against eval baselines. Mycelium communicates in human language, not framework jargon: - "Discovering what problems to solve" not "L2 Opportunity Discover phase" - "Confidence: Moderate -- based on 2 user interviews" not "Confidence: 0.5" - After each phase, the agent offers to capture learnings - Day-to-day: Session resumption via /diamond-assess , corrections review via hooks - Weekly: Diamond state review, canvas updates - Monthly: BVSSH health check, Wardley map review, stale diamond cleanup - Quarterly: North Star review, strategic landscape refresh, eval benchmarks - Escape hatch: Documented bypass process for emergencies (production incidents, hotfixes) with mandatory payback | Skill | When to Use | |---|---| /interview | Onboarding: purpose, vision, North Star, project classification | /diamond-assess | Current state in plain language, recommended next action | /diamond-progress | Move diamond forward with theory gates + skill suggestions | | Skill | When to Use | |---|---| /user-interview | Torres-style story-based interviews with bias mitigation | /mocked-persona-interview | Disciplined mocked personas for solo/hobby/dogfood projects (speculation-tagged, stop-condition gated) | /user-needs-map | Allen's methodology: map needs independently of solutions | /ost-builder | Build/update Opportunity Solution Tree from research | /jtbd-map | Jobs to be Done (functional, emotional, social) | /assumption-test | Design smallest viable test for an assumption | /cynefin-classify | Classify problem domain | /wardley-map | Create/update Wardley Map of value chain | /ice-score | Prioritize with ICE scoring + confidence meter | /gist-plan | GIST planning: goals, ideas, steps, tasks | /handoff | Generate structured handoff for offline human tasks | /log-evidence | Record findings from completed offline conversations | | Skill | When to Use | |---|---| /bias-check | Review cognitive biases before research/decisions | /devils-advocate | Challenge assumptions before major decisions | /bvssh-check | Holistic BVSSH health evaluation | /service-check | Downe's 15 Good Services principles | /threat-model | STRIDE threat modeling | /privacy-check | Privacy by Design / GDPR assessment | /security-review | OWASP secure design review | /usability-check | Nielsen's 10 usability heuristics (interface-level) | /a11y-check | Accessibility audit (WCAG 2.1 AA) | | Skill | When to Use | |---|---| /delivery-bootstrap | Auto-detect tech stack, set up tooling | /preflight | Pre-code validation checklist | /reflexion | Self-correcting implementation loop | /definition-of-done | Verify all DoD criteria | /dora-check | DORA delivery performance metrics | /retrospective | Post-delivery learning capture | | Skill | When to Use | |---|---| /launch-tier | Classify releases, plan go-to-market | /team-shape | Team Topologies assessment | | Skill | When to Use | |---|---| /canvas-update | Update canvas with new evidence | /canvas-sync | Synchronize canvas across team via git | /fan-out | Parallel agent orchestration for OST exploration | | Skill | When to Use | |---|---| /feedback-review | Aggregate all feedback signals, check health across 4 loops | /eval-runner | Run benchmark scenarios | /prompt-optimizer | A/B test instruction changes | | Theory/Framework | Author(s) | Applied To | |---|---|---| | Golden Circle (Start with Why) | Sinek | L0: Purpose, mission, values | | Jobs to be Done | Christensen, Ulwick | L0-L2: Functional/emotional/social needs | | Wardley Mapping | Wardley | L1: Strategic landscape, evolution | | North Star Framework | Ellis, Amplitude | L1: Key metric + input metrics | | Team Topologies | Skelton, Pais | L1: Team structure, cognitive load | | Continuous Discovery Habits / OST | Torres | L2: Opportunity discovery, assumption testing | | User Needs Mapping | Allen | L2: User needs independent of solutions | | Cynefin Framework | Snowden | L2-L4: Domain classification, method selection | | GIST Planning / ICE Scoring | Gilad | L3: Evidence-guided prioritization | | Inspired / Empowered | Cagan | L3: Four risks, empowered teams | | Good Services (15 Principles) | Downe | L3-L4: Service design quality | | Accelerate / DORA Metrics | Forsgren, Humble, Kim | L4: Delivery performance measurement | | OWASP Secure by Design

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기