Claude는 3라운드에 걸쳐 시스템을 구축했고, 1라운드의 잠재 버그는 3라운드에서 폭발했습니다.

hackernews | | 🔬 연구
#claude #review #버그 #벤치마크 #스키마강제 #시스템복잡성
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

Okay, 사용자가 영어 뉴스 기사를 한국어로 2-4문장으로 요약해달라고 요청했네. 히커 뉴스 소스의 'Claude built a system in 3 rounds...' 제목의 기사인데, 내용이 매우 간결해. 사용자는 분명 AI 요약 전문가 역할을 강조했으니, 핵심을 뽑아내는 능력이 중요할 거야. 특히 'latent bugs from round 1 exploded in round 3' 이 부분이 핵심인데... 여기서 'latent bugs'는 '잠재적 버그'로 번역하되, 'exploded'는 '폭발했다'보다는 '발현되었다'가 자연스러울 것 같아. 주의해야 할 점: 제목 반복 금지, 원문 그대로 번역 금지, 구체적 사포 포함. 기사 텍스트가 매우 짧아서... 아마 이건 실제 개발 사례일 가능성이 높아. 실제로 AI가 만든 시스템의 버그 발생 사례인 듯. 오케이, 한국어 요약을 이렇게 구성해볼까: 1) 3단계로 시스템 구축 사실 → 2) 1단계 잠재적 버그가 3단계에서 발생 → 3) 이로 인한 문제 결과(추정) 근데 마지막 문장은 추정일 수 있으니 신중해야겠어... 실제 기사 내용을 보니 'exploded'가 다른 의미일

본문

How schema-enforced cells change the reliability equation as system complexity grows. We ran four progressively complex benchmarks, each building on the previous one. The traditional approach achieved 100% test passage through V1 and V2, but the V1 shipping bug finally surfaced in V3 — causing 17 test failures after being latent for two full rounds of development. | Benchmark | Subsystems | Tests | Traditional | Mycelium | Traditional LOC | Mycelium LOC | |---|---|---|---|---|---|---| | Checkout Pipeline | 3 | 8 / 39 assertions | 39/39 pass | 39/39 pass | ~130 | ~230 + manifest | | Order Lifecycle V1 | 6 | 18 / 136 assertions | 136/136 pass | 136/136 pass | ~540 | ~590 + ~130 manifest | | Order Lifecycle V2 | 11 | 30 / 235 assertions | 235/235 pass | 235/235 pass | ~722 | ~900 + ~360 manifest | | Order Lifecycle V3 | 15 | 52 / 383 assertions | 366/383 pass | 383/383 pass | ~920 | ~1146 + ~440 manifest | Scale: ~130 lines, 3 subsystems (discounts, tax/shipping, payment). What it tests: A single linear pipeline -- items in, total out. - 37/39 assertions passed on first attempt - 1 bug: floating-point rounding in tax calculation ( 50.0 * 0.0725 = 3.6249... rounds to 3.62 instead of 3.63) - Fixed in 1 iteration - 39/39 assertions passed on first logic execution (zero logic bugs) - 4 iterations of compiler-guided structural fixes before first run: missing :on-error , undeclared data flow keys, dead-end graph route - Each fix was guided by a clear error message Both approaches work fine. The problem is small enough that an AI agent can hold the entire system in context. The traditional approach is simpler and faster to implement. Mycelium's overhead (~100 extra lines + manifest) is proportionally high (~75%) and hard to justify for a problem this small. Mycelium advantage: The structural validators caught 3 issues that would have been silent in the traditional approach (missing error handling, undeclared data flow, dead-end route). But at this scale, these issues are easy to catch through testing or code review. Latent bugs: Traditional 0, Mycelium 0. Scale: ~540 lines, 6 subsystems (item expansion, promotions with 5 stacking types, per-item tax with state exemptions, multi-warehouse shipping, split payment, loyalty points with tiered earning). What it tests: Three interacting workflows (placement, returns, modification) that share data contracts. Returns must correctly reverse the forward calculation, including proportional discount distribution, per-item tax, and split payment reversal. The traditional approach was built by 4 separate AI subagents: - Agent 1: Order placement (~342 lines) - Agent 2: Returns processing (~136 lines) -- given spec + tests, no placement source - Agent 3: Order modification (~58 lines) -- given spec + tests, no other source - Agent 4: Added COMBO75 feature by modifying code it didn't write - 18/18 tests passing, 136 assertions - 2 latent bugs discovered: :shipping-detail vs:shipping-groups -- Returns code destructures:shipping-detail but placement outputs:shipping-groups . Getsnil , silently produces $0 shipping refund for all defective returns- Double inventory reservation -- modify-order callsplace-order which re-reserves inventory without releasing the original reservation - 18/18 tests passing on first attempt, 136 assertions - 0 latent bugs - Returns manifest explicitly requires :shipping-groups :any in its input schema, making the key mismatch impossible - Modification workflow uses the same placement workflow, so inventory semantics are consistent This is the tipping point. The traditional approach has crossed the threshold where implicit contracts between components fail silently. Two independently competent AI agents (placement and returns) produced code that connects incorrectly through a key name mismatch. All tests pass because no test exercises the specific path (defective return of an item with non-zero shipping cost). The key insight: The bug is not in any single agent's work. Each agent's code is internally correct. The bug is in the contract between agents -- a contract that exists only implicitly in the traditional approach and explicitly in the mycelium manifest. Latent bugs: Traditional 2, Mycelium 0. Scale: ~722 lines, 11 subsystems. Five new features added by 5 sequential AI subagents, each modifying the existing V1 codebase: - Bulk pricing (quantity discounts before all other promos) - Store credit (third payment method in 3-way waterfall) - Gift wrapping (per-item, separate tax rate, refund rules) - Restocking fees (category-dependent on changed-mind returns) - Multi-currency (display-time conversion with separate display fields) - 30/30 tests passing, 241 assertions - 4 latent bugs (2 carried forward from V1 + 2 new): :shipping-detail vs:shipping-groups -- still present after 5 more agents touched the code- Double inventory reservation -- still present currency-rates duplicated in 3 files (placement, returns, modification)gift-wrap-cost-per-item duplicated in

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →