HN 표시: PgVector 및 BM25를 사용하여 LongMemEval에서 81% – Claude 코드용 메모리

hackernews | | 📦 오픈소스
#ai 모델 #bm25 #claude #llama #openai #pgvector #시맨틱 메모리 #자체호스팅 #클로드
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

오픈소스 도구 'Claude Echoes'는 세션이 초기화될 때마다 문맥을 잃는 AI 코딩 어시스턴트의 한계를 극복하기 위해 사용자의 모든 프롬프트와 응답을 기록하고 의미 기반으로 검색할 수 있게 해줍니다. 로컬 환경에서 완벽하게 구동되는 이 도구는 Postgres(pgvector)와 로컬 Ollama 임베딩 모델을 활용하여 약 10분 만에 설치할 수 있으며, 기존 대비 매우 가벼운 약 800줄의 코드로 작성되었습니다. 요약 없이 실제 대화 내용을 그대로 보존하며, ICLR 2025 기준 LongMemEval 벤치마크에서 81.0%의 우수한 검색 정확도를 입증했습니다. 사용자는 각 세션 내에서 '/recall' 명령어를 통해 원하는 과거의 대화 내역을 200ms 이내에 의미론적으로 빠르게 불러와 작업의 연속성을 유지할 수 있습니다.

본문

Verbatim semantic memory for Claude Code sessions. Every prompt, every response, every project — searchable by meaning, across every session you've ever run. Self-hosted. Free. ~10 minutes to install. "I ran three sessions yesterday on the same project. Which one fixed the bug?" Echoes answers that question in 200ms, in plain English, from your verbatim history. Benchmarked honestly. 81.0% on LongMemEval (ICLR 2025) with Sonnet 4.6 — pgvector cosine + BM25 RRF hybrid + temporal re-ranking. 100% on single-session-user retrieval (70/70). Full per-category breakdown, raw outputs, and reproduction steps in benchmarks/. No hardcoded answer patterns. No invented terminology. No cherry-picking. Claude Code is great at one session. It has no idea what happened in the last one. Echoes captures every user prompt and assistant response as you work, stores them in Postgres with pgvector embeddings from a local Ollama model (nomic-embed-text , 768-dim), and exposes a /recall skill inside Claude Code so you can ask: /recall when did we fix the partition key bug /recall cors issue with azure functions --project azureprep /recall vercel deployment failure --days 7 and get the actual verbatim conversation back — with project, role, date, and a session ID so you can pull the surrounding context. No summarisation. No LLM deciding what's "important." No vendor lock-in. No cloud. Your data stays on your machine (or your VPS). The embedding model is a 137M-parameter local model. Retrieval is a single SQL query against an HNSW index. The entire thing is ~600 lines of Python + 200 lines of JavaScript. Honest comparison: | claude-echoes | MemPalace | mem0 / letta | | |---|---|---|---| | Claude Code hook integration | Native, single-file JS hook | No | No | | Storage | Postgres + pgvector | ChromaDB | Varies | | Embedding model | Local Ollama (free, no API) | Local | Usually OpenAI | | Cross-session semantic recall | Yes | Yes | Yes | | Install time | ~10 min (one docker-compose) | ~30 min | Varies | | Designed for Claude Code specifically | Yes | No | No | | Summarises / extracts "important" bits | No — verbatim only | No | Yes | | LongMemEval_S score | 81.0% (Sonnet) / 64.4% (Haiku) — reproducible | 96.6% claimed (disputed) | Not reported | | Lines of code | ~800 + ~600 benchmark | ~3000+ | Large | If you already run a Postgres and don't use Claude Code, MemPalace is probably better for you. If you live in Claude Code all day and keep losing context between sessions, this is built for you. Requires: Docker, a Claude Code install, and either Linux, macOS, or Windows (WSL2 or Git Bash). git clone https://github.com/m4cd4r4/claude-echoes.git cd claude-echoes ./scripts/install.sh The installer will: - Spin up Postgres 15 + pgvector and Ollama via docker-compose - Pull the nomic-embed-text model (~275MB) - Apply the schema migration - Install the Claude Code hook into ~/.claude/hooks/claude-echoes/ - Install the /recall skill into~/.claude/skills/claude-echoes/ - Print a test command to verify Total time: ~10 minutes (most of which is pulling the ollama model). After install, just use Claude Code normally. Every prompt and response is captured and embedded in the background with zero added latency. Within a few sessions, /recall starts returning useful results. Inside any Claude Code session: /recall Filters: /recall --project # limit to one project /recall --days # only messages from last N days /recall --role user|assistant # filter by sender /recall --limit # how many hits (default 10) Or call the HTTP API directly: curl "http://localhost:8088/search?q=how+did+we+fix+the+auth+bug&limit=5" ┌─────────────────┐ │ Claude Code │ │ (your IDE) │ └────────┬────────┘ │ UserPromptSubmit / Stop hook ▼ ┌─────────────────┐ │ chat-logger.js │ (hooks/chat-logger.js, ~150 LOC) └────────┬────────┘ │ POST /message ▼ ┌─────────────────┐ │ echoes-server │ (server/app.py, ~200 LOC) │ FastAPI │◄──────┐ └────┬───────┬────┘ │ │ │ │ embed on insert│ │search │GET /search?q=... ▼ ▼ │ ┌──────────┐ ┌──────────┐ │ │ Ollama │ │ Postgres │ │ │ nomic- │ │ + HNSW │ │ │ embed │ │ pgvector │ │ └──────────┘ └──────────┘ │ │ ┌─────────────────┐ │ │ /recall skill │───────┘ │ (SKILL.md) │ └─────────────────┘ Flow on every message: - Claude Code fires UserPromptSubmit orStop hook event chat-logger.js POSTs the message tohttp://localhost:8088/message - Server calls Ollama to embed the content (~150ms, in-process) - Server inserts (session_id, project, role, content, embedding) into Postgres - Total latency: ~200ms, fully async, never blocks your prompt Flow on /recall : - You type /recall in any session - Skill calls GET /search?q=&...filters - Server embeds the query, runs ORDER BY embedding $1::vector LIMIT N against the HNSW index - Returns ranked hits with full verbatim content + metadata - Claude renders them with similarity scores and project/date context - It does not summarise. Your history is verbatim. If you want a summary, ask Claude to summarise the recall resul

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →