HN 표시: 평면 압축 대신 챗봇 메모리를 위한 Union-Find
hackernews
|
|
🔬 연구
#ai
#review
#union-find
#메모리
#압축
#챗봇
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
대화량이 늘어나면서 컨텍스트 윈도우를 초과하는 기존 챗봇들의 문제를 해결하기 위해, 50년 된 알고리즘인 'Union-Find'를 적용한 새로운 압축 방식이 제안되었습니다. 이 방식은 관련 메시지들을 그룹화하여 병합함으로써, 단순 요약 방식보다 원본 출처에 대한 추적성과 복구 가능성을 보장합니다. 실제 DevOps 대화 데이터를 이용한 실험 결과, 200개의 메시지를 압축할 때 새로운 방식이 기존 방식보다 정보 검색율을 8~18%포인트 더 높이는 것으로 확인되었습니다.
본문
Union-Find Compaction Part of the cognition series. Builds on The Natural Framework, General Intelligence, and The Handshake. Every chatbot that compacts stalls the entire conversation to do it. But does it need to? Every chatbot has the same context problem: conversations grow, the window doesnât. When it fills up, the system summarizes and discards. Run a cheap model over everything outside a hot window, truncate to budget. The summary replaces the source. Traceability, expandability, selective retrieval: all gone. Compression is coarse. The summary replaces every source, so it canât trace a claim back to the original message. Chatbot layer from Diagnosis LLM. Full diagnostic: six roles, three layers, nine dysfunction cells. A convenient fix Union-find has been around for half a century. Tarjan (1975) proved its amortized cost: find (follow parent pointers to root, compress path) and union (merge two trees, balance by rank), both near-O(1). Fast, and it never forgets which elements belong together. The insight: use union-find as the provenance spine for context compaction. Each message starts as a singleton. Similar messages merge into equivalence classes. Each union is a cheap LLM call: take two small summaries, produce one. The calls were going to happen anyway. The contribution is routing them through a structure that gives you amortized lookup and canonical roots for free. Five properties follow: - Provenance. Every summary traces back to its source messages through find() . You can audit any summary. - Recoverability. expand(root_id) reinflates a cluster back to its sources. Raw messages stay addressable through the parent pointers. Flat summarization discards them. - Incremental. Messages graduate one at a time. Each graduation is near-O(1). No batch stall, no latency spike. Flat summarization reprocesses the entire history at once. - Cheaper. Each union feeds a small cluster (5â20 messages) to the summarizer. Smaller prompts, smaller models. Haiku produced good per-cluster summaries. - Persistent. The forest serializes naturally. Parent pointers are just integers. Save it, load it next session, clusters intact. Compound cache The implementation splits context into two zones: Hot â the last 10 messages, served raw. When the window overflows, the oldest message graduates to cold. Cold â everything older, managed by the union-find forest. On graduation, the new message keeps its timestamp, gets a TF-IDF vector (cheap, local, accurate enough), and is compared against cluster centroids by cosine similarity. Above a merge threshold (0.15), it joins that cluster via union ; below, it starts a new singleton. A hard cap (default 10 clusters) forces the closest pair to merge when exceeded. Centroids update as weighted averages. Thatâs the write path. The read path: vectorize the incoming message, find the nearest e-class root, inject that clusterâs summary alongside the hot window. Unlike RAG, the summaries are pre-merged at write time. Context stays bounded. Eviction from a flat summary already happened, invisibly, at compression time. Eviction from a structure with provenance is a policy choice. That choice is deferred here: storage grows monotonically, the forest only adds nodes, and the underlying store accumulates indefinitely. Eventually you drop stale clusters or archive old source messages. Experiment github.com/kimjune01/union-find-compaction â code, data, and full trial logs. The git history shows the thought process: each trialâs design decisions, what changed between iterations, and why. Does union-find compaction actually preserve more detail than flat summarization, given the same token budget? If not, the provenance argument is academic. If so, everyone gets better recall, and lineage comes free. A synthetic 200-message DevOps conversation, seeded with 40 verifiable facts. Same cheap summarizer (Haiku), same budget, same retrieval machinery for both methods. A strict LLM judge scores binary recall: âPostgreSQL 16.2â counts, âPostgreSQLâ doesnât. McNemarâs test on discordant pairs. Seven trials. Results | # | Config | Flat | UF | p | |---|---|---|---|---| | 1 | Haiku, 50 | 90% | 90% | 1.000 | | 2 | Haiku, 200 | 65% | 82% | 0.039 | | 3 | Sonnet, 200 | 70% | 78% | 0.453 | | 4 | Haiku, 200, retrieval | 68% | 82% | 0.180 | | 5 | Haiku, 200, tuned | 62% | 80% | 0.065 | | 6 | Haiku, 200, timestamps | 72% | 90% | 0.092 | | 7 | Haiku+Sonnet, 200 | 75% | 90% | 0.070 | At low compression (50 messages), no difference. At 200 messages, UF leads by 8â18pp across every trial. Trial 2 is the only one that clears significance (p = 0.039, McNemarâs); the rest are directional but underpowered at n = 40. Trial 7 mimics production: cheap model summarizes, expensive model answers. The expensive model canât recover facts the cheap summary already dropped. What flat drops Flat summarization preserves headline facts (PostgreSQL 16.2, bcrypt auth) and drops footnote facts: the scrape interval, the cron schedule, the we
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유