HN 표시: Memwright – 다중 에이전트 팀을 위한 자체 호스팅 메모리, 경로에 LLM 없음
hackernews
|
|
📦 오픈소스
#ai 딜
#ai 에이전트
#claude
#longmemeval
#review
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
MIT의 Memwright는 다중 에이전트 시스템을 위해 설계된 셀프 호스팅 메모리 인프라로, 핵심 경로에서 LLM을 사용하지 않고 결정론적 검색을 제공합니다. 기존 에이전트들이 동일한 사실을 반복해서 학습하거나 컨텍스트가 단절되는 문제를 해결하기 위해 네임스페이스 격리, 모순 해소, 토큰 예산 관리 등의 기능을 지원합니다. 이를 통해 기관은 데이터 주권을 유지하며 에이전트 간의 지속적이고 효율적인 협업 환경을 구축할 수 있습니다.
본문
§ 00 · MASTHEAD · FILED UNDER INFRASTRUCTURE · BY SURENDRA SINGH · — FOR PUBLICATION — MEMWRIGHT — A MEMORY JOURNAL FOR AGENTIC SYSTEMS · VOL. 02 · REV. 0.1 · EST. 2026 · NEW YORK · MIT Self‑hosted · Deterministic retrieval · No LLM in the critical path Problem ↓ · Multi‑Agent ↓ · Pipeline ↓ · Deploy ↓ · Principles ↓ · Reference ↓ Memwright doesn’t search. It remembers. Agent prototypes don’t survive production. Memory is why. Single agents rediscover the same facts every run. Multi‑agent pipelines are worse — the planner’s decisions never reach the executor, the researcher’s findings never reach the reviewer. Teams paper over it by stuffing giant prompts between agents, burning tokens on stale context. That’s a workaround, not an architecture. Memory accumulates. Load primes. Four writes across Mon–Thu land as persisted memories. Friday morning a fresh Portfolio Planner wakes up to a new task, calls mem.load_context() , and Memwright ranks + dedupes + budget‑fits all four back into the context window. The agent resumes with full continuity — earnings signal, risk cap, prior stance, compliance precedent — not RAG over documents, but the agents’ own history replayed into a fresh context. Zero LLM calls in the critical path. Production‑grade memory infrastructure for multi‑agent systems. The memory tier your agents need when they leave your laptop and start running in production. Namespace isolation · RBAC · Provenance tracking · Temporal correctness · Ranked retrieval · Token budgets — built for orchestrator‑worker and planner‑executor pipelines. Python library, REST API, or containerized service. No SaaS middleman, no per‑seat fees, no vendor lock‑in. from agent_memory import AgentMemory mem = AgentMemory("./store") mem.add("Order service uses event sourcing", entity="order-service", tags=["arch"]) mem.recall("how is the order service structured?", budget=2000) poetry add memwright MIT · Python 3.10–3.14 · Production deploy in one command | § Spec Sheet | | |---|---| | Storage Roles | Doc · Vector · Graph | | Interfaces | Python · REST · MCP | | Retrieval Layers | 5 | | RBAC Roles | 6 | | Cloud Targets | Amazon Web Services · Microsoft Azure · Google Cloud Platform | | License | MIT | Why agent prototypes don't survive production Agent prototypes don't survive production. Memory is usually why. Single agents rediscover the same facts every run. Multi‑agent pipelines are worse — the planner's decisions never reach the executor, the researcher's findings never reach the reviewer. Teams end up stuffing giant prompts between agents to paper over the gap. That's not an architecture — that's a workaround. What we hear from teams building agent pipelines: We had a planner, a coder, a reviewer, a deployer — four agents in a pipeline. None of them knew what the others learned. We were passing giant prompts between them and burning tokens on stale information. | Without Memwright | With Memwright | |---|---| | 01 — Each agent starts blind — no knowledge of what others learned | 01 — Shared memory — planner writes, coder reads, reviewer sees both | | 02 — Giant prompts passed between agents burn context tokens | 02 — Token‑budget recall — each agent pulls only what fits | | 03 — No access control — any agent can overwrite any state | 03 — Six RBAC roles, namespace isolation, write quotas per agent | | 04 — Contradicting facts from different agents go undetected | 04 — Contradictions auto‑resolved — newer facts supersede older ones | | 05 — Session ends, everything learned is gone forever | 05 — Persistent across sessions, pipelines, and agent restarts | More agents, more sessions, more memories — retrieval gets better while context cost stays flat. Orchestrator · Planner · Executor · Reviewer Not a chatbot plugin. Infrastructure for agent teams. Every recall and write is scoped to an AgentContext — a lightweight dataclass carrying identity, role, namespace, parent trail, token budget, write quota, and visibility. Contexts are immutable; spawning a sub‑agent returns a new context with inherited provenance. | # | Primitive | What it does | |---|---|---| | 01 | Namespace isolation | Every agent, project, or tenant gets its own namespace. Planner writes, coder reads, reviewer sees both. Isolated by default, shared when you configure it. | | 02 | Six RBAC roles | Orchestrator, Planner, Executor, Researcher, Reviewer, Monitor. Read‑only observers to full admins. | | 03 | Provenance tracking | Know which agent wrote which memory, when, and under which parent session. The reviewer can trace a decision back to the planner three sessions ago. | | 04 | Cross‑agent contradiction resolution | Agent A learns "user works at Google." Agent B learns "user works at Meta." Memwright auto‑supersedes. Full history preserved. Zero inference calls in the critical path. | | 05 | Token budgets per agent | recall(query, budget=2000) — a summarizer uses 500 tokens; a deep reasoner uses 5,000. Each agent receives exactly what fits in its context window. | | 06 | Write quotas & review flags | Rate‑limit writes per namespace, flag writes for human review, add compliance tags for audit. | Five layers · zero inference calls Five layers. No LLM. Everything deterministic. When an agent calls recall(query, budget) , five cooperating layers find, fuse, score, and fit the most relevant memories into the requested token ceiling. The store can hold ten million memories; the context window never sees more than the budget. | # | Layer | Backend | Mechanism | |---|---|---|---| | 01 | Tag Match | SQLite | Tag index, FTS, exact + partial hits | | 02 | Graph Expansion | NetworkX / AGE | Multi‑hop BFS (depth 2) | | 03 | Vector Search | ChromaDB / pgvector | Cosine similarity | | 04 | Fusion + Rank | In‑process | RRF (k=60) + PageRank + confidence decay | | 05 | Diversity + Fit | In‑process | MMR (λ=0.7) + greedy token‑budget pack | Every memory is persisted across three complementary stores. Every supported backend combination is just a different technology choice for one or more of these roles. | Role | What it stores | Why it exists | |---|---|---| | Document store | The source of truth — content, tags, entity, category, timestamps, provenance, confidence | Where add() commits; where recall() hydrates final memory text | | Vector store | Dense embedding per memory, keyed by memory ID | Finds memories by meaning when no tag or word overlaps the query | | Graph store | Entity nodes + typed edges (uses , authored-by , supersedes ) | Connects memories indirectly — query “Python” can surface “Django” via the graph | Example framing — a market intelligence system feeding a financial advisor pipeline. Every signal the desk cares about lands here. flowchart TB N1["News wiresReuters • Bloombergbreaking headlines"] N2["Market dataticks • OHLCprices • volumes"] N3["Earnings reports10-K • 10-Q • 8-Kguidance • surprises"] N4["Leadership changesCEO • CFO • Boardappointments • exits"] N5["Geopolitical eventstariffs • sanctionspolicy • conflict"] subgraph SOURCES ["§ MARKET INTELLIGENCE SOURCES"] direction LR N1 ~~~ N2 ~~~ N3 ~~~ N4 ~~~ N5 end SOURCES ==>|"mem.add(content, tags, entity, provenance, ts)"| API{{"INGEST API"}} D1["Document storeinsert rowcontent • tags • entityts • source • confidence"] D2["Vector storeembed text→ 384-d vectorkeyed by memory ID"] D3["Graph storeextract entities + edgesissuer • sector • personcountry • event"] subgraph WRITE ["§ PARALLEL WRITES — one logical transaction"] direction LR D1 ~~~ D2 ~~~ D3 end API --> D1 API --> D2 API --> D3 CD["Contradiction checkper entity • per fielde.g. JPM CFO is X vs new JPM CFO is Y"] D1 ==> CD D2 ==> CD D3 ==> CD CD ==>|newer fact wins| SUP["Supersede older factkeep in timeline for audit"] SUP ==> DONE["Committed • recallable"] style SOURCES fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614 style WRITE fill:#FBF8F1,stroke:#1A1614,stroke-width:2px,color:#1A1614 style API fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8 style CD fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614 style SUP fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614 style DONE fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8 style N1 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style N2 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style N3 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style N4 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style N5 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style D1 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style D2 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style D3 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 The three writes commit as one logical transaction. On SQL backends it’s a real DB transaction; on distributed backends it’s sequenced with best-effort rollback. Contradictions don’t overwrite — older facts are superseded and retained in the timeline so auditors can reconstruct what the desk knew, and when. A different kind of ingestion: the memories are not external feeds — they are the agents’ own conversation as they debate a trade proposal. Every turn is captured with speaker, timestamp, entity, and a decision marker. flowchart TB T1["Portfolio Planner“Proposing to add 2% JPMahead of Q3 print”"] T2["Market Researcher“Consensus EPS is +4% QoQ,whisper number suggests beat”"] T3["Risk Analyst“Tariff headline risk this week— raise stop to 8%”"] T4["Compliance Reviewer“Ok with position limit.Logging rationale.”"] DEC["Decision reachedbuy 2% JPM • stop 8% • pre-earnings"] subgraph CONV ["§ AGENT-TO-AGENT CONVERSATION — trade proposal thread"] direction LR T1 --> T2 T2 --> T3 T3 --> T4 T4 --> DEC end CAP["Turn-level capturespeaker • utterance • tsentity: JPM • topic: position sizingkind: conversation"] CONV ==>|auto-capture hookevery turn| CAP CAP ==>|"mem.add(content, speaker, thread_id, kind="chat")"| API{{"INGEST API"}} DOC[("Document storeturn rows, thread_id,speaker, ts, decision flag")] VEC[("Vector storeembedding per turn")] GR[("Graph storeedges: agent → entityagent → decision")] API --> DOC API --> VEC API --> GR DONE["Thread memorialisedreplayable • attributable • auditable"] DOC ==> DONE VEC ==> DONE GR ==> DONE style CONV fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614 style CAP fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614 style API fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8 style DONE fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8 style T1 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style T2 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style T3 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style T4 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style DEC fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614 style DOC fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style VEC fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style GR fill:#FBF8F1,stroke:#1A1614,color:#1A1614 Different from feed ingestion: the source is the agents themselves, not the outside world. Every turn is attributed to a speaker, tied to a thread, and flagged if it contained a decision. Nothing is paraphrased — the verbatim utterance is preserved so the reasoning can be reconstructed under audit. Same market intelligence system — now the Portfolio Planner asks a real question ahead of the morning call. flowchart TB U["Financial Advisortyping into chat UIahead of the 8am call"] BUBBLE["Chat messageWhat do we know about JPM’s CFO transitionand the fallout for US regional banks?"] subgraph CHAT ["§ ADVISOR CHAT INTERFACE — human in the loop"] direction LR U ==> BUBBLE end CHAT ==>|routed to| AGENT(["Portfolio Planner agentdecomposes intent • issues recall"]) Q1["JPM CFO transition"] Q2["Semiconductor supply-chainrisk after latest tariff move"] Q3["Earnings surprises inUS regional banks, last 90 days"] subgraph QUERIES ["§ DECOMPOSED RECALL QUERIES — what the agent actually asks memwright"] direction LR Q1 ~~~ Q2 ~~~ Q3 end AGENT ==> QUERIES QUERIES ==>|"mem.recall(query, budget=2000)"| API{{"RECALL API"}} L1["01 • Tag Match→ document storeFTS on JPM, CFO,tariff, earnings"] L2["02 • Graph Expansion→ graph storeBFS: JPM → CFO→ Jeremy Barnum"] L3["03 • Vector Search→ vector storecosine on querytop-K nearest embeddings"] subgraph SOURCES ["§ STAGE A — parallel sources • fan-out across 3 indexes"] direction LR L1 ~~~ L2 ~~~ L3 end API --> L1 API --> L2 API --> L3 IDS[("Candidate memory IDs~100s • deduped")] L1 --> IDS L2 --> IDS L3 --> IDS L4["04 • Fusion & RankRRF k=60 • PageRank boost on central entities• confidence decay on stale prints"] L5["05 • Diversity & FitMMR λ=0.7 — drop near-duplicate news wires• greedy pack under 2,000 tokens"] OUT["Portfolio Planner context"] REPLY["Chat reply to advisorsourced • dated • auditable"] IDS ==>|hydrate from doc store| L4 L4 ==> L5 L5 ==>|ranked memories ≤ budgetzero LLM calls in the path| OUT OUT ==>|grounded answer streamed to chat| REPLY style CHAT fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614 style U fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style BUBBLE fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614 style AGENT fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8 style REPLY fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614 style QUERIES fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614 style SOURCES fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614 style API fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8 style L1 fill:#FBF8F1,stroke:#C15F3C,color:#1A1614 style L2 fill:#FBF8F1,stroke:#C15F3C,color:#1A1614 style L3 fill:#FBF8F1,stroke:#C15F3C,color:#1A1614 style IDS fill:#F5F1E8,stroke:#1A1614,color:#1A1614 style L4 fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614 style L5 fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614 style OUT fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8 style Q1 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style Q2 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 style Q3 fill:#FBF8F1,stroke:#1A1614,color:#1A1614 Only memory IDs travel between layers until the hydrate step. A store with ten million market-intel rows still returns a tight result set inside the caller’s token ceiling. Graph expansion is the step that lets “tariff” surface memories about “TSMC” and “Nvidia” without either word appearing in the query. A different kind of recall: no human in the loop. An agent resuming a task — or handing off to a peer — pulls back its own prior working context: earlier decisions, peer rationale, what was true at the last checkpoint. flowchart TB RESUME["Risk Analyst resuming mid-taskor Compliance Reviewer taking handoffno user prompt — agent self-initiated"] INTENT["Context intent“what did the desk already decideabout JPM this week, and why?”"] RESUME ==> INTENT INTENT ==>|"mem.recall_context(thread_id, entity="JPM", since=7d)"| API{{"CONTEXT RECALL API"}} F1["Thread filtersame thread_idor same namespace"] F2["Entity filterentity = JPMand related via graph"] F3["Temporal filterwithin last 7dsupersedes resolved"] F4["Spea
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유