Bella: AI 에이전트용 Hypergrah 메모리(10x 시간 범위)

hackernews | | 📦 오픈소스
#ai 모델 #ai 에이전트 #bellamem #claude #gpt-4 #hypergraph #openai #메모리 시스템 #지속적 학습
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

AI 코딩 에이전트의 대화창 크기 한계와 세션 초기화로 인해 이전 맥락을 잃고 기각된 방안을 다시 제안하는 등의 문제를 해결하기 위해 장기 기억 레이어인 'Bella(패키지명 bellamem)'가 개발되었습니다. 이 도구는 대화 중의 결정, 거절된 접근법, 인과관계 등을 초그래프 형태의 신념 그래프로 추출하여 세션 경계를 넘어 정보를 유지합니다. 벤치마크 결과에 따르면 기존 RAG 방식의 상위 k개 검색은 판단 정확도가 31%에 그친 반면, Bella의 구조화된 확장 검색은 92%의 우수한 정확도를 기록했습니다. 사용자는 세션 시작 및 종료 시 수동으로 저장하고 불러오는 방식을 사용하며, 사전 편집 훅(Hook)을 통해 에이전트가 이전에 거절된 방식을 재사용하려 할 때 자동으로 이를 차단할 수 있습니다.

본문

Continuous hypergraph memory for AI agents. Across sessions, across tasks, across domains. Bella is the visual brand; the Python package and CLI remain bellamem .pipx install bellamem , thenbellamem save . You use a coding agent every day. You already know the failure modes. It loses continuity. You spent yesterday nailing down why a test keeps flaking. Today's session starts fresh — and suggests bumping the timeout, the exact bandaid you explicitly rejected yesterday. Yesterday doesn't exist. It hits the wall. A long session fills the context window. You /compact to buy room; you /clear to start over. Either way the specifics evaporate — the rejected approaches, the causal chains, the small invariants that took ten messages to earn. It forgets mid-stream. Even inside a single session, the agent loses what it agreed to twenty turns ago. You re-brief, re-anchor, re-explain. Dementia in slow motion: it's talking to you with the same fluency, but the thing you told it at message 5 isn't in its head anymore at message 50. It confabulates with conviction. Worst of all, when it forgets, it doesn't say "sorry, remind me." It confidently tells you the rejected approach like it's new. It asserts a fallacy firmly. It isn't asking to be re-briefed — it's wrong with full confidence, and you're the one who has to catch it every time. One root cause: coding agents only have working memory. The context window is the memory. When a turn falls out, it falls out forever — and the agent carries on unaware, filling the gap with plausible-sounding defaults. Bella is the long-term memory layer. It runs alongside the agent, extracts the structure of every conversation — decisions, rejected approaches, causes, self-observations — into a belief hypergraph that survives /clear , new sessions, and new days. When tomorrow's session asks about the flaky test, it doesn't guess. It loads what yesterday actually decided, what yesterday rejected, and why. A real debugging session — twenty turns of dead-ends, side-questions, acknowledgments, and the actual fix at the end. Left column is what the context window holds. Right column is what Bella extracts alongside it — and what survives after /clear . | Flat session — what the context window holds | Bella hypergraph — what survives | |---|---| | | | ~220 tokens · 20 turns · ordered by time · dies at | ~50 tokens · 4 beliefs · ordered by evidence mass · persists | Same information content, different geometry. The left column lets an agent reconstruct what was said. The right column lets it reconstruct what was decided, what was rejected, and what caused what — in far fewer tokens, and across the session boundaries where the left column can't go. And the four items on the right are exactly the ones the agent would otherwise forget, re-suggest, or confabulate about tomorrow: a ratified decision (mass earned from two voices), a causal chain (the why), a dispute (the rejected bandaid, which Bella's edit guard will block if the agent tries it again), and a self-observation about its own reasoning pattern. pipx is the recommended path — a single global bellamem command, no .venv to remember, no PATH surgery: pipx install bellamem # or, from a local clone: git clone https://github.com/immartian/bellamem pipx install -e ./bellamem # editable install, still global Per-project venv also works: cd your-project python3 -m venv .venv .venv/bin/pip install bellamem Optional extras: pipx inject bellamem 'sentence-transformers>=2.2' # local embeddings pipx inject bellamem 'openai>=1.0' # OpenAI embeddings + LLM EW # or with pip: pip install 'bellamem[st]' # sentence-transformers pip install 'bellamem[openai]' # OpenAI pip install 'bellamem[all]' # both Copy .env.example → .env in your project and fill in the backends you enabled. .env is gitignored. Requirements: Python 3.10+. Git (Bella scopes per-project state via the git repo root). No other system dependencies. Three retrieval modes — one for each question you actually ask about your memory. Most workflows live in these three commands: | Command | Question | |---|---| bellamem expand "X" | What do we believe about X, ranked by importance? | bellamem surprises | What just changed — what mattered? | bellamem replay [X] | What did we say — in what order? | Plus utility commands for ingest, audit, render, prune, and bench: # Ingest Claude Code sessions for the current project. # Auto-runs R3 consolidation (merges near-duplicates) on new claims. bellamem save # Three retrieval modes — same memory, different questions: bellamem expand "what did we decide about persistence" bellamem surprises # top jumps, sign flips, disputes bellamem replay # narrative timeline bellamem replay "ad-hoc bandaid pattern" # focused narrative # The pre-edit pack: no recency, surfaces invariants + disputes + causes bellamem before-edit "should I wrap this in try/except" --entity embed.py # Health report: bandaid piles, duplicates, garbage field names, mass limbo bellamem audit # Render the graph as a picture (needs the [viz] extra or graphviz CLI) bellamem render --out graph.svg # whole forest bellamem render --out disputes.svg --disputes-only # just ⊥ edges bellamem render --out auth.svg --focus "auth tokens" # subgraph around a focus # Forget orphan leaves that never earned their place (dry run by default) bellamem prune # preview candidates bellamem prune --apply # actually remove them # Empirically compare context strategies (flat, compact, RAG, Bella) bellamem bench Every command except save , emerge , prune --apply , and scrub is read-only. The flow that lets you keep working past the context window without losing the thread is packaged into four slash commands. bellamem install-commands # writes ~/.claude/commands/bellamem.md /bellamem now works in every Claude Code project on your machine. Per-project install (--project ) is also supported if you want to commit the slash command into a specific repo. | Command | What it does | |---|---| /bellamem or /bellamem resume | Working-memory replay tail + long-term expand pack + top surprises. Run at session start. | /bellamem save | Ingest the current session (auto-consolidates), run audit, report top new surprises. Run before /clear or at end of day. | /bellamem recall | Mass-ranked beliefs about a topic, disputes included. Mid-session lookup. | /bellamem why | Pre-edit pack: invariants, disputes, causes, entity bridges. Run before a risky change. | /bellamem replay / /bellamem audit | Raw CLI output when you want to look at it directly. | /bellamem save ← captures this session into the graph /clear ← wipe the context window (Claude Code built-in) /bellamem resume ← fresh assistant reconstructs where you were On a well-tuned project, /bellamem resume comes back in ~30k tokens and contains enough to pick up the next decision without re-asking questions already answered. If it's much larger, run bellamem emerge to consolidate near-duplicates. Install bellamem-guard as a Claude Code PreToolUse hook and an advisory pack (invariants + disputes + causes for the focus) is injected automatically before every Edit / Write / MultiEdit call — no manual invocation needed. The guard exit-2 s when the edit re-suggests a rejected approach (a ⊥ dispute), refusing the tool call at the boundary. Hook registration (once per project) in .claude/settings.json : { "hooks": { "PreToolUse": [ { "matcher": "Edit|Write|MultiEdit", "hooks": [{ "type": "command", "command": "bellamem-guard" }] } ] } } ~/.claude/commands/ bellamem.md installed once (global slash command) / .claude/settings.json PreToolUse hook registration (optional) .graph/ default.json belief graph (gitignored by default) default.emb.bin belief embeddings, v3 binary sidecar embed_cache.json embedding cache (pruned to live beliefs on save) llm_ew_cache.json LLM EW cache (if BELLAMEM_EW=hybrid) .env your API keys + embedder choice (never commit) .graph/ is gitignored by default. Both compress a long session. The difference is load-bearing: /compact | Bella | | |---|---|---| | Output | One narrative summary (~2000 tokens) | Queryable belief graph (~3k per retrieval) | | Shape | Prose | Beliefs + typed edges (→ , ⊥ , ⇒ ) + mass + voices + sources | | Usage | Replaces history; summary becomes new context | Load on demand per turn; three retrieval modes | | Preserves | Broad topics, major decisions, flow | Paraphrased decisions, rejected approaches, cause-effect chains, self-observations, line numbers | | Loses | Identifiers, ⊥ corrections, causal structure | Tool outputs, file contents, conversational texture | | Cross-session | None — dies with the session | Full — graph persists, next session inherits it | On our bench, the compact-style contender (gpt-4o-mini summary) scored 8% LLM-judge rate; Bella's expand scored 92% at a comparable budget. The structural weakness of narrative summaries is that they preserve themes but lose the specific decisions, corrections, and causes an agent actually needs to act. The two are complementary, not competing: /compact keeps the feel of the conversation going inside one session. Bella keeps the decisions available across sessions. A preview of the v0.1.1 3D viz rendering a real Bella belief hypergraph — roughly 1,800 beliefs from a month of Claude Code sessions on Bella itself, across eight topical fields. Drag to rotate; the replay bar scrubs history so you can watch the graph accumulate decision by decision. Click the image above to play the .webm. (GitHub doesn't reliably embed tags pointing at raw files, so the poster + link is the universal fallback.) Latest measurement: benchmarks/v0.0.4rc1.md (2026-04-10, budget = 1200 tokens, LLM judge enabled, 13-item hand-labeled corpus, 1834-belief forest). metric flat_tail compact rag_topk expand before_edit ---------------------------------------------------------------------------------- exact hit rate 15 % 0 % 15 % 69 % 46 % embed hit rate 23 % 31 % 31 % 85 % 77 % llm judge rate 0 % 8 % 31 % 92 % 69 % avg tokens used 1200 602 1161 1143 964 flat_tail (0%) < compact (8%) < rag_topk (31%) < before_edit (69%) < expand (92%) . Headline story — compare to v0.0.2: as the forest grew from the v0.0.2 dogfood snapshot to 1834 beliefs, rag_topk collapsed from 85% → 31% LLM judge (cosine top-k pulls up more plausible-looking-but-wrong neighbors in a larger forest), while expand held at 92%. The gap from expand to the next-best contender widened from 15pp to 61pp. Structured mass-weighted retrieval scales with forest size; cosine top-k doesn't. The retrieval code path (core/expand.py , core/bella.py ) is unchanged between v0.0.2 and v0.0.4rc1 — every delta is a property of forest growth, not algorithm changes. See benchmarks/README.md for the versioning convention and when to re-run. Bella lives alongside the agent, not inside it. That boundary is load-bearing and currently unmoved: we can't reach in and rewrite the context window directly. What we have today is advisory: - No direct context-window control. Bella can't swap active tokens, evict irrelevant context, or replace the window wholesale. The agent still controls what it attends to; Bella can only offer packs the agent can choose to read. - /compact stays LLM-driven. Claude Code's native /compact writes a narrative summary via an LLM call. APreCompact hook that lets Bella substitute a graph-backed compaction would unlock most of the remaining wins — and that hook surface does not exist in Claude Code today. We can't intercept it from outside. - Save/clear/resume is a manual pattern. You run /bellamem save →/clear →/bellamem resume yourself. It works, but it's a human-in-the-loop ritual, not an autonomous context manager. - The edit guard is a tool-call boundary, not a semantic gate. bellamem-guard injects an advisory pack before every edit andexit-2 s on a dispute re-suggestion, but it sees tool-call text, not model intent. An agent that ignores the advice can still try the edit; the block is at the boundary, not deeper in the model. - One adapter at a time. Claude Code works today. Codex and others need their own turn-pair reaction classifier and source stamper. The common thread: every limitation above is about how much of the agent's context lifecycle we can observe and influence from outside. With deeper hooks — or a coding agent that exposed its context as a first-class API — a graph memory like Bella could drive the compaction cycle itself instead of being handed the leftovers. We expect the upside of real context-window control to be substantial. For now, the honest frame is: Bella is the memory layer; the agent is still the window manager. v0.1.0 — alpha, dogfooded on its own construction. Bella was built in Claude Code sessions that were themselves ingested into the Bella being built. When the assistant drifted into an ad-hoc bandaid pattern during development, the user's correction landed in the graph as the highest-surprise belief of the session. That kind of self-observation is the point. Since v0.0.2: - v0.0.3 — per-project .graph/ , automatic R3 consolidation on ingest, source grounding + narrative replay, structural pruning,bellamem save default-to-current-session with incremental ingest, and embed-cache prune bounded to live beliefs. - v0.0.4rc1 — storage split: belief embeddings moved out of default.json into adefault.emb.bin sidecar (v3 format), cutting non-vector operations' load time from ~2s to ~500ms.bellamem-guard PreToolUse hook ships: advisory pack before every edit, exit-2 block on dispute re-suggestions. Embedder batching reduces save latency. - v0.1.0 — log-odds decay gated on BELLAMEM_DECAY=on : on every save, non-exempt beliefs fade exponentially toward the 0.5 prior at a 30-day half-life (reserved fields,mass_floor pins, ⊥ disputes, and ⇒ causes are exempt). Newbellamem decay subcommand for dry-run preview +--apply . v3 → v4 snapshot format adds adecayed_at header. See the "Decay and reinforcement — the steady state" section of THEORY.md for the collision math. - v0.1.1 (planned) — decay on by default after dogfood validates the steady state, Three.js 3D viz with temporal replay, and graph-backed compaction when the hook surface allows. See CHANGELOG.md for details. bellamem/ core/ gene.py Belief + Gene + Jaynes accumulation + jumps + sources ops.py the seven operations: CONFIRM, AMEND, ADD, DENY, CAUSE, MERGE, MOVE (complete mutation API) bella.py forest + routing + entity index embed.py pluggable embedders (Hash/ST/OpenAI) + .env store.py v3 split snapshot (graph JSON + embeddings.bin) + signature check expand.py expand() + expand_before_edit() with freshness weight emerge.py R3 consolidation — merge + rename audit.py entropy signals: piles, glut, duplicates, limbo, names surprise.py top Jaynes jumps + sign flips + dispute formations replay.py chronological retrieval from source-grounded beliefs adapters/ chat.py voice-aware regex EW + turn-pair reaction classifier claude_code.py .jsonl reader + system-noise filter + source stamping llm_

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →