Catalog of AI Knowledge Retrieval, Memory and RAG Systems

hackernews | 2026년 4월 12일 14:32 | 📦 오픈소스

#ai #claude #gpt-4 #knowledge #llama #memory #openai #rag #retrieval #하드웨어/반도체

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

생물학적 기억 메커니즘과 유사하게 작동하는 100개 이상의 AI 지식 및 메모리 시스템 생태계를 분석한 참조 카탈로그가 공개되었습니다. 이 문서는 파편화된 기술 환경을 체계화하기 위해 벡터 데이터베이스, RAG 프레임워크, 그래프 검색, 에이전트 메모리 등의 프로젝트를 하드웨어 호환성과 스택 계층에 따라 분류했습니다. 특히 Apple의 Metal, NVIDIA의 CUDA, CPU 등 하드웨어 가속 조건에 따른 각 솔루션의 구체적인 요구 사항과 배포 방식을 비교 평가할 수 있도록 구성되어 있어 인프라 구축 담당자들에게 유용한 가이드를 제공합니다.

본문

A reference catalog mapping of AI knowledge systems that store, retrieve, and reason. The thesis: every AI knowledge system is solving a problem that biological memory already solved, just with different tradeoffs. Humans remember through association (vector similarity), narrative (episodic memory), relationships (knowledge graphs), consolidation (sleep/dreaming), and forgetting (interference, decay). AI systems have converged on strikingly parallel architectures: vector databases for associative recall, conversation logs for episodes, graph RAG for relational reasoning, offline consolidation for memory pruning, and TTL/auto-expiry for managed forgetting. This catalog exists because the landscape is fragmented across 100+ projects with overlapping names, unclear boundaries, and fast-changing codebases. It answers: what does each project actually do, what layer of the stack does it occupy, what hardware does it need, and how does it map to the cognitive function it's replacing. Anyone building or evaluating AI knowledge infrastructure — particularly local-first setups where hardware compatibility (Metal vs CUDA vs CPU) determines what's even possible. Sections are ordered from low-level infrastructure (vector DBs, embedding servers) to high-level cognition (memory management, dreaming/consolidation). If you're building a stack, read bottom-up. If you're evaluating a specific project, find its category and compare within-section. The cognition mapping table at the bottom connects everything back to the human memory mechanisms each category replaces. Cross-referenced against Awesome-Agent-Memory, Awesome-Memory-for-Agents, and Awesome-GraphRAG. GPU/platform data aligned to GPU Compute Platforms Breakdown. Platform terminology follows the six-layer stack model (see companion doc): | Tag | Meaning | Stack Layer | |---|---|---| CPU | CPU-only, no GPU acceleration | — | CUDA | NVIDIA GPU via CUDA API + cuBLAS/cuDNN kernels | L1-L2 | Metal | Apple GPU via Metal API + native shaders (llama.cpp, Ollama) | L1 | MPS | Apple GPU via Metal Performance Shaders (PyTorch device="mps" ) | L2 via L4 | MLX | Apple GPU via Apple's ML framework (unified memory, lazy eval) | L4 | ROCm | AMD GPU via ROCm/HIP | L1-L2 | Vulkan | Cross-platform GPU via Vulkan compute | L1 | SYCL | Intel GPU via oneAPI/SYCL | L1 | Any | Platform-agnostic (SaaS API, Docker) | — | ⚠ | Unverified — check repo | — | - ⭐ Stars: Approximate GitHub stars as of April 2026. Rounded to nearest K. Sourced from repo pages, search results, and Awesome lists. Counts change daily — treat as order-of-magnitude indicators, not exact figures. Key distinction: Metal = native GPU shaders (fast, no PyTorch) vs MPS = PyTorch routing through Metal Performance Shaders (convenient, overhead). Your Ollama/llama.cpp stack uses Metal , not MPS . - Updated: Last release as of April 2026. ~ = approximate. - Deploy: Self = self-hosted |Cloud = managed service |Both = either Purpose: Store and search high-dimensional vectors (embeddings) by similarity. The foundation layer — everything else retrieves from here. | Project | ★ | GitHub | Lang | Accel | Env | Updated | License | |---|---|---|---|---|---|---|---| | FAISS | 33K | facebookresearch/faiss | C++/Py | CPU, CUDA | Py 3.8+, cmake | ~2026-Q1 | MIT | | Chroma | 18K | chroma-core/chroma | Rust/Py | CPU | Py 3.9+ or Docker | ~2026-Q1 | Apache 2.0 | | Milvus | 43K | milvus-io/milvus | Go/C++ | CPU, CUDA | Docker/K8s | ~2026-Q1 | Apache 2.0 | | Weaviate | 15K | weaviate/weaviate | Go | CPU, CUDA (modules) | Docker | ~2026-Q1 | BSD-3 | | Qdrant | 22K | qdrant/qdrant | Rust | CPU | Docker/binary, 1GB+ RAM | ~2026-Q1 | Apache 2.0 | | Pinecone | — | pinecone.io | SaaS | Any | API key only | Active | Proprietary | | LanceDB | 5K | lancedb/lancedb | Rust/Py | CPU | Py 3.9+, pip | ~2026-Q1 | Apache 2.0 | | Vespa | 4K | vespa-engine/vespa | Java/C++ | CPU | Docker, 8GB+ RAM | ~2026-Q1 | Apache 2.0 | | pgvector | 13K | pgvector/pgvector | C | CPU | PostgreSQL 12+ | ~2026-Q1 | PostgreSQL | | Turbopuffer | — | turbopuffer.com | SaaS | Any | API key, $64/mo min | Active | Proprietary | Note: Vector DBs themselves rarely need GPU. The GPU matters for the embedding model that feeds vectors into them. FAISS is the exception — it can use CUDA for the similarity search itself. Purpose: Orchestrate the full retrieve-then-generate pipeline. Chunking, embedding, retrieval, prompt assembly, LLM call. Most are orchestrators that call external model servers. | Project | ★ | GitHub | Lang | Accel | Env | Updated | License | |---|---|---|---|---|---|---|---| | LangChain | 95K | langchain-ai/langchain | Py/JS | CPU (calls LLM APIs) | Py 3.9+ or Node 18+ | ~2026-Q1 | MIT | | LlamaIndex | 38K | run-llama/llama_index | Py | CPU (calls LLM APIs) | Py 3.9+ | ~2026-Q1 | MIT | | Haystack | 20K | deepset-ai/haystack | Py | CPU, CUDA (local models) | Py 3.9+ | ~2026-Q1 | Apache 2.0 | | RAGFlow | 65K | infiniflow/ragflow | Py | CPU, CUDA | Docker, 16GB+ RAM | ~2026-Q1 | Apache 2.0 | | txtai | 10K | neuml/txtai | Py | CPU, CUDA, MPS | Py 3.9+, torch | ~2025-Q4 | Apache 2.0 | | LLMWare | 8K | llmware-ai/llmware | Py | CPU, CUDA, MPS | Py 3.9+ | ~2025-Q4 | Apache 2.0 | | Flowise | 35K | FlowiseAI/Flowise | TS | CPU (calls LLM APIs) | Node 18+ | ~2026-Q1 | Apache 2.0 | | R2R | 4K | SciPhi-AI/R2R | Py | CPU, CUDA | Py 3.10+, Docker | ~2025-Q4 ⚠ | MIT | | Pathway | 4K | pathwaycom/pathway | Py/Rust | CPU | Py 3.10+, Linux/Mac | ~2026-Q1 | BSL 1.1 | | Morphik | 2K | morphik-ai/morphik-core | Py | CPU, CUDA | Py 3.10+ | ~2025-Q4 ⚠ | Apache 2.0 | Note: Most RAG frameworks are orchestrators — they call external LLM/embedding APIs and don't run models themselves. "CPU" means the framework runs on CPU; GPU usage depends on the model server it calls (Ollama, vLLM, etc.). Purpose: Add entity-relationship structure on top of vector retrieval. Answers questions that require connecting dots across documents ("who reported to whom during which project"). | Project | ★ | GitHub | Lang | Accel | Env | Updated | License | |---|---|---|---|---|---|---|---| | GraphRAG | 20K | microsoft/graphrag | Py | CPU (calls LLM APIs) | Py 3.10+ | ~2026-Q1 | MIT | | LightRAG | 15K | HKUDS/LightRAG | Py | CPU, CUDA, MPS (needs patch) | Py 3.10+, 8GB+ RAM | 2026-02 | MIT | | LinearRAG | — | DEEP-PolyU/LinearRAG | Py | CPU, CUDA | Py 3.9+. ICLR'26 | ~2025-Q4 ⚠ | MIT | | LogicRAG | — | chensyCN/LogicRAG | Py | CPU, CUDA | Py 3.9+. AAAI'26 | ~2025-Q4 ⚠ | MIT | | Cognee | 5K | topoteretes/cognee | Py | CPU (calls LLM APIs) | Py 3.10+, SQLite/LanceDB/Kuzu | ~2026-Q1 | Apache 2.0 | | Neo4j | 14K | neo4j/neo4j | Java | CPU | JDK 17+, Docker, 2GB+ heap | ~2026-Q1 | GPL-3/Comm | | nano-graphrag | 5K | gusye1234/nano-graphrag | Py | CPU | Py 3.9+, minimal deps | ~2025-Q3 | MIT | | fast-graphrag | — | circlemind-ai/fast-graphrag | Py | CPU | Py 3.9+ | ~2025-Q4 ⚠ | MIT | | LangGraph | 10K | langchain-ai/langgraph | Py/JS | CPU | Py 3.9+ or Node 18+ | ~2026-Q1 | MIT | Note: LightRAG MPS issue — default HF embedding code checks torch.cuda.is_available() and falls back to CPU, ignoring MPS. Patch: addelif torch.backends.mps.is_available() check. Not needed when routing embeddings through Ollama (which uses Metal natively). Purpose: Retrieval engines that solve specific problems — hybrid search, code search, cache-augmented generation, or token-level reranking. | Project | ★ | GitHub | Lang | Accel | Env | Updated | License | |---|---|---|---|---|---|---|---| | QMD | — | tobi/qmd | TS | CPU (node-llama-cpp) | Node 18+, ~2GB GGUF models | ~2026-Q1 ⚠ | MIT | | RAGatouille | 3K | bclavie/RAGatouille | Py | CPU, CUDA, MPS | Py 3.9+, torch (ColBERT) | ~2025-Q4 | Apache 2.0 | | RAG-Anything | — | HKUDS/RAG-Anything | Py | CPU, CUDA | Py 3.10+ | ~2025-Q4 ⚠ | MIT | | Meilisearch | 48K | meilisearch/meilisearch | Rust | CPU | Docker/binary, low RAM | ~2026-Q1 | MIT | | FlashRAG | — | RUC-NLPIR/FlashRAG | Py | CPU, CUDA, MPS | Py 3.9+, torch | ~2025-Q4 ⚠ | MIT | | PageIndex | — | VectifyAI/PageIndex | Py | CPU ⚠ | Py 3.9+ | ~2025-Q3 ⚠ | ⚠ | | REFRAG | — | simulanics/REFRAG | Py | CPU ⚠ | Py 3.9+ | ~2025-Q3 ⚠ | ⚠ | | CAG | — | hhhuang/CAG | Py | CPU | Py 3.9+ | ~2025-Q3 ⚠ | ⚠ | | GitNexus | — | nxpatterns/gitnexus | Py | CPU ⚠ | Py 3.9+ | ~2025-Q3 ⚠ | ⚠ | Purpose: Convert raw files (PDFs, HTML, images, web pages) into clean, chunked text that vector DBs and RAG frameworks can ingest. Garbage in, garbage out — this layer determines retrieval quality. | Project | ★ | GitHub | Lang | Accel | Env | Updated | License | |---|---|---|---|---|---|---|---| | Unstructured | 10K | Unstructured-IO/unstructured | Py | CPU, CUDA (OCR models) | Py 3.9+, tesseract/poppler opt | ~2026-Q1 | Apache 2.0 | | Docling | 18K | DS4SD/docling | Py | CPU, CUDA, MPS | Py 3.10+, torch | ~2026-Q1 | MIT | | Firecrawl | 30K | mendableai/firecrawl | TS | CPU | Node 18+ or SaaS API | ~2026-Q1 | AGPL-3.0 | | Marker | 20K | VikParuchuri/marker | Py | CPU, CUDA, MPS | Py 3.10+, torch | ~2026-Q1 | GPL-3.0 | | Crawl4AI | 35K | unclecode/crawl4ai | Py | CPU | Py 3.9+, Playwright | ~2026-Q1 | Apache 2.0 | | PaperQA | 7K | Future-House/paper-qa | Py | CPU (calls LLM APIs) | Py 3.11+ | ~2026-Q1 | Apache 2.0 | Purpose: Take the top-N candidates from initial retrieval and re-score them with a more expensive model. Bridges the gap between "vaguely relevant" (vector search) and "actually useful" (cross-encoder scoring). | Project | ★ | GitHub | Lang | Accel | Env | Updated | License | |---|---|---|---|---|---|---|---| | ColBERT | 4K | stanford-futuredata/ColBERT | Py | CUDA (strong rec), CPU | Py 3.8+, torch | ~2025-Q2 | MIT | | FlashRank | 2K | PrithivirajDamodaran/FlashRank | Py | CPU | Py 3.8+, <100MB models | ~2025-Q3 | Apache 2.0 | | FlagEmbedding | 8K | FlagOpen/FlagEmbedding | Py | CPU, CUDA, MPS | Py 3.8+, torch | ~2026-Q1 | MIT | | Cohere Rerank | — | cohere.com/rerank | SaaS | Any | API key | Active | Proprietary | | zerank-2 | — | HuggingFace GGUF | C++ | CPU, Metal, CUDA, ROCm, Vulkan | llama.cpp server, 2-8GB model | N/A | ⚠ | zerank-2 on your stack: Runs via llama.cpp on host Mac Studio → Metal backend → M4 Max GPU. The reranker is served at 127.0.0.1:8090 and ClawRAG calls it via HTTP after ChromaDB returns initial candidates. Purpose: Run LLMs and embedding models locally or serve them via API. The compute layer. This is where GPU acceleration (Metal, CUDA) actually matters. | Project | ★ | GitHub | Lang | Accel | Env | Updated | License | |---|---|---|---|---|---|---|---| | Ollama | 110K | ollama/ollama | Go | CPU, Metal, CUDA, ROCm | macOS/Linux/Win, 8GB+ RAM | ~2026-Q1 | MIT | | llama.cpp | 80K | ggml-org/llama.cpp | C/C++ | CPU, Metal, CUDA, ROCm, Vulkan, SYCL | cmake, C++17. Multi-platform | ~2026-Q1 | MIT | | vLLM | 50K | vllm-project/vllm | Py | CUDA only | Py 3.9+, CUDA 12+, 16GB+ VRAM | ~2026-Q1 | Apache 2.0 | | TEI | 3K | huggingface/text-embeddings-inference | Rust | CPU, CUDA | Docker/Rust build, Linux rec | ~2026-Q1 | Apache 2.0 | | sentence-transformers | 16K | UKPLab/sentence-transformers | Py | CPU, CUDA, MPS | Py 3.8+, torch | ~2026-Q1 | Apache 2.0 | | FlagEmbedding | 8K | FlagOpen/FlagEmbedding | Py | CPU, CUDA, MPS | Py 3.8+, torch | ~2026-Q1 | MIT | Metal vs MPS here: Ollama and llama.cpp use Metal (native shaders, no PyTorch, fastest path). sentence-transformers and FlagEmbedding use MPS (PyTorch → Metal Performance Shaders → Metal, convenient but slower). vLLM has no Apple Silicon support — CUDA only. Your inference path: Ollama → llama.cpp → Metal shaders → M4 Max GPU cores (no PyTorch, no MPS, no MLX in the critical path) Purpose: Give AI agents persistent memory across sessions. Extract facts from conversations, store them durably, retrieve when relevant. The "remember me" layer. | Project | ★ | GitHub | Lang | Accel | Env | Updated | License | Deploy | |---|---|---|---|---|---|---|---|---| | Mem0 | 48K | mem0ai/mem0 | Py/JS | CPU (calls LLM APIs) | Py 3.9+/Node. Default: gpt-4.1-nano | 2026-Q1 | Apache 2.0 | Both | | TeleMem | — | TeleAI-UAGI/TeleMem | Py | CPU (calls LLM APIs) | Py 3.9+. Drop-in Mem0 replacement | ~2026-Q1 | ⚠ | Self | | Letta | 15K | letta-ai/letta | Py | CPU (calls LLM APIs) | Py 3.10+. Runs as server | ~2026-Q1 | Apache 2.0 | Both | | Supermemory | 5K | supermemoryai/supermemory | TS | CPU (internal engine) | npm/pip. MCP server. <300ms recall | ~2026-Q1 | Source-avail | Cloud/Ent | | MemOS | — | MemTensor/MemOS | Py | CPU | Py 3.9+ | ~2025-Q4 ⚠ | ⚠ | Self | | MemMachine | — | MemMachine/MemMachine | Py | CPU | Py 3.9+ | ~2025-Q4 ⚠ | Apache 2.0 | Self | | SuperLocalMemory | — | qualixar/superlocalmemory | Py | CPU | Py 3.9+. Zero cloud (Mode A) | ~2026-Q1 | ⚠ | Self | | Cognee | 5K | topoteretes/cognee | Py | CPU (calls LLM APIs) | Py 3.10+. SQLite+LanceDB+Kuzu | ~2026-Q1 | Apache 2.0 | Self | | EverMemOS | — | EverMind-AI/EverMemOS | Py | CPU | Py 3.9+ | ~2026-Q1 ⚠ | ⚠ | Self | Note: Memory frameworks are almost universally CPU-only. They store/retrieve/manage memories and call external LLM APIs for extraction/reasoning. GPU usage is delegated to whatever model server they're configured to use (Ollama, OpenAI, etc.). Purpose: Track conversation history, temporal facts, and episodic sequences. Optimized for "what happened when" rather than "find similar content." | Project | ★ | GitHub | Lang | Accel | Env | Updated | License | Deploy | |---|---|---|---|---|---|---|---|---| | MemPalace | 2K | milla-jovovich/mempalace | Py | CPU | Py 3.9+, ChromaDB, SQLite | ~2025-Q4 ⚠ | ⚠ | Self | | Zep/Graphiti | 8K | getzep/graphiti | Py | CPU | Py 3.10+, Neo4j | ~2026-Q1 | Apache 2.0 | Both | | Honcho | — | plastic-labs/honcho | Py | CPU | Py 3.10+ | ~2025-Q4 ⚠ | ⚠ | Self | | Memobase | — | memodb-io/memobase | Py | CPU | Py 3.9+ | ~2025-Q4 ⚠ | ⚠ | Both | | Hindsight | 3K | vectorize-io/hindsight | Py/TS/Go | CPU | 1 Docker cmd. Embedded Postgres | ~2026-Q1 | MIT | Self | | Second Me | 10K | mindverse/Second-Me | Py | CPU, CUDA ⚠ | Py 3.9+ | ~2025-Q4 ⚠ | ⚠ | Self | | MIRIX | — | Mirix-AI/MIRIX | Py | CPU | Py 3.9+ | ~2025-Q3 ⚠ | ⚠ | Self | | MemU | — | NevaMind-AI/memU | Py | CPU | Py 3.9+ | ~2025-Q4 ⚠ | ⚠ | Self | | ReMe | — | modelscope/MemoryScope | Py | CPU | Py 3.9+ | ~2025-Q4 ⚠ | Apache 2.0 | Self | Purpose: Persist context for coding agents (Claude Code, OpenClaw). Remember project state, decisions, and working patterns across sessions. | Project | ★ | GitHub | Lang | Accel | Env | Updated | License | |---|---|---|---|---|---|---|---| | claude-mem | — | thedotmack/claude-mem | TS | CPU | Node 18+, Claude Code plugin | ~2026-Q1 | ⚠ | | Memov | — | memov-io/memov | TS | CPU | Node 18+, Git, Claude Code | ~2026-Q1 ⚠ | ⚠ | | LangMem | — | langchain-ai/langmem | Py | CPU | Py 3.9+, LangGraph required | ~2026-Q1 | MIT | | OpenMemory | — | caviraoss/openmemory | Py | CPU | Py 3.9+, MCP-native | ~2025-Q4 ⚠ | ⚠ | | Memori | — | memorilabs/memori | Py | CPU | Py 3.9+, SQL-native | ~2025-Q4 ⚠ | ⚠ | Purpose: Research systems from papers —

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기