Show HN: 자체 개선 에이전트 메모리 시스템, 92% R 5 LongMemEval, PostgreSQL

hackernews | | 📦 오픈소스
#ai 딜 #ai 에이전트 #claude #llama #longmemeval #openai #postgresql #메모리 시스템 #신경과학
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

MemForge는 생물학적 뇌의 수면 주기를 모방하여, 유휴 시간 동안 기억을 재검토하고 개선하는 자가 발전형 AI 에이전트 메모리 시스템입니다. 단일 PostgreSQL 데이터베이스로 벡터 검색 및 지식 그래프 등을 모두 처리하며, LongMemEval 벤치마크 하이브리드 모드에서 92.0%의 높은 회상율(R@5)을 달성했습니다. 특히 8차례의 보안 감사를 모두 통과할 만큼 안정성이 뛰어나며, 자주 쓰이는 기억은 강화하고 미사용 기억은 자연스럽게 감소시키는 지능형 3단계(핫-웜-콜드) 메모리 관리 기능을 제공합니다.

본문

Neuroscience-inspired memory system for AI agents. Sleep cycles consolidate, revise, and strengthen memories — just like biological brains. MemForge manages agent memory across three tiers (hot → warm → cold) with vector search, a knowledge graph, LLM-driven reflection, procedural learning, and a memory revision engine that actively improves stored knowledge during idle periods. For AI agents reading this: See CLAUDE.md for project instructions, code conventions, and architecture rules. See BACKLOG.md for open issues and improvement areas. Beta — Production hardening is complete. MemForge has passed 8 rounds of security audit (all clean at MEDIUM+), ships with a CI/CD pipeline, and has been benchmarked on LongMemEval (92.0% R@5 hybrid mode, 88.0% R@5 keyword mode). The full test suite covers integration paths, LLM-dependent paths via mock providers, HTTP API endpoints, and load targets. See CONTRIBUTING.md for how to contribute and the ROADMAP.md for the long-term plan. Most AI memory systems are passive stores — they save and retrieve, but the stored knowledge never improves. MemForge is different: - Memories get better over time. Sleep cycles actively rewrite low-confidence memories using LLM review, producing progressively more accurate knowledge. - The system measures its own quality. Revision stability, retrieval correlation, and contradiction rates tell you whether memory is actually improving — without external benchmarks. - Retrieval reinforces memory. Memories that are frequently accessed and lead to good outcomes become stronger. Unused memories decay naturally. - One database does everything. PostgreSQL handles storage, full-text search, vector similarity, and graph traversal. No Neo4j, no Pinecone, no separate systems to manage. See INTEGRATION.md for how to wire MemForge into your agent (any framework, any language). See SPECIFICATION.md for design philosophy and ARCHITECTURE.md for internal architecture. - Tiered Memory — Hot (raw events) → Warm (consolidated, searchable, scored) → Cold (archived audit trail) - Hybrid Search — Dual-tokenizer keyword (PostgreSQL FTS + trigram), semantic (pgvector HNSW), and asymmetric reciprocal rank fusion (semantic 1.5×) with keyword overlap boost, temporal proximity scoring, result deduplication, quality threshold, and entity detection - Local In-Process Embeddings — EMBEDDING_PROVIDER=local uses@xenova/transformers (bge-small-en-v1.5 default) to generate embeddings in-process at ~137/sec on CPU. Zero external dependency — no Ollama or OpenAI required for semantic search. - Query Understanding — Strips question scaffolding, auto-extracts time references as date filters, splits compound queries into independent sub-queries for multi-query retrieval - Knowledge Graph — Entities and relationships extracted during consolidation, traversable via recursive CTEs - Sleep Cycles — 5-phase background processor: scoring → triage → revision → graph maintenance → reflection. Includes autonomous weight adaptation. - Memory Revision — LLM rewrites low-confidence memories (augment, correct, merge, compress) - Reflection — LLM synthesizes higher-order insights and detects contradictions - Meta-Reflection — Second-order reflection on reflections surfaces durable principles - Procedural Memory — Condition→action rules extracted from reflections - Active Ingest — Hints API, preference extraction, entity detection, supersession. Agents participate in their own memory management. - Content Deduplication — Near-duplicate detection at ingest time prevents redundant storage - Confidence Graduation — High-retrieval, high-feedback memories automatically strengthen - Outcome Feedback — Structured outcome tags close the self-improvement loop - Active Recall — Proactively surface relevant memories before agent actions - Agent Resumption — Single endpoint returns a full context bundle for fast warm-start - Entity Deduplication — Trigram-based duplicate entity detection and merge - Temporal Intelligence — Time-bounded queries, decay scoring, timeline view - Multi-Tenant — All operations scoped by agent ID - Security Hardened — Zod validation, advisory locks, prompt injection boundaries, RLS, SSRF prevention, security headers. 8 audit rounds, all clean at MEDIUM+. - MCP Server — 17 tools for Claude Code, Cursor, and MCP-compatible AI tools - TypeScript SDK — Zero-dependency HTTP client for any JS runtime - LLM Opt-In — Post-retrieval reranking and LLM-assisted ingest available but off by default - Node.js >= 20 - PostgreSQL 16+ with pgvector andpg_trgm extensions - Redis (optional — degrades gracefully if unavailable) git clone https://github.com/salishforge/memforge.git cd memforge npm install cp .env.example .env # edit DATABASE_URL at minimum # Apply database schema (fresh install) psql "$DATABASE_URL" -f schema/schema.sql # If upgrading from v2.1.x, apply migrations in order: psql "$DATABASE_URL" -f schema/migration-v2.2.sql psql "$DATABASE_URL" -f schema/migration-v2.3.sql psql "$DATAB

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →