HN 표시: OpenFable – 트리 구조 인덱스를 사용하는 오픈 소스 RAG 엔진

hackernews | 2026년 4월 8일 22:26 | 📦 오픈소스

#ai 딜 #anthropic #claude #llama #llm #openai #openfable #rag #검색 엔진 #오픈소스

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

오픈소스 RAG 검색 엔진인 'OpenFable'은 문서를 단순히 평면적으로 나누지 않고 LLM을 활용해 계층적 의미론적 포레스트(semantic forest) 인덱스를 구축하여 다단계 질의나 문맥 이해가 필요한 복잡한 검색의 정확도를 높입니다. 사용자가 지정한 토큰 예산 내에서 LLM 기반의 논리적 추론 경로와 벡터 유사도 검색 경로를 결합한 이중 경로(Bi-path) 방식을 통해 최적의 청크를 반환합니다. 별도의 답변 생성 기능 없이 검색에만 특화되어 있으며, 개발자는 REST API나 MCP 클라이언트를 통해 시스템에 연결한 뒤 자체 LLM을 연동하여 최종 결과물을 생성할 수 있습니다.

본문

An open-source retrieval engine implementing FABLE (Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval). OpenFable accepts documents as raw text, builds LLM-enhanced semantic forest indexes, and retrieves relevant content through bi-path retrieval with adaptive budget control. Retrieval only -- OpenFable returns ranked chunks, not generated answers. Bring your own LLM for generation. export OPENAI_API_KEY=sk-... docker compose up -d Connect any MCP client to http://localhost:8000/v1/mcp/sse — Claude Desktop, Cursor, or your own agent: pip install mcp-use langchain-openai import asyncio from langchain_openai import ChatOpenAI from mcp_use import MCPAgent, MCPClient async def main(): client = MCPClient.from_dict({ "mcpServers": {"openfable": {"url": "http://localhost:8000/v1/mcp/sse"}} }) agent = MCPAgent(llm=ChatOpenAI(), client=client, max_steps=10) print(await agent.run("Ingest this document: The Eye of Kurak was discovered by " "archaeologist Lena Voss in 1923 beneath the ruins of Kurak.")) print(await agent.run("Search the indexed documents: Who discovered the Eye of Kurak?")) asyncio.run(main()) A REST API is also available — see API Reference or the OpenAPI docs at http://localhost:8000/docs. Most RAG systems chunk documents into flat segments and retrieve by vector similarity. This works for simple queries but breaks down when: - A question spans multiple sections of a document - The answer requires understanding how sections relate to each other - You need to control how many tokens you send to the LLM - Relevant content is buried in a subsection that doesn't match the query's surface-level keywords | Fixed-size chunking | Semantic chunking | RAPTOR | FABLE | | |---|---|---|---|---| | Chunk boundaries | Token count | Embedding similarity | Token count | LLM-identified discourse breaks | | Index structure | Flat | Flat | Bottom-up tree (clustering) | Top-down tree (LLM-generated hierarchy) | | Retrieval | Vector only | Vector only | Vector over tree layers | Bi-path: LLM reasoning + vector with tree propagation | | Budget control | None | None | None | Token budget with adaptive document/node routing | FABLE solves this by building a semantic forest -- a tree structure where each document becomes a hierarchy of nodes (root, sections, subsections, leaves). Retrieval then uses two complementary paths at each level: - LLM-guided path -- an LLM reasons about which documents and subtrees are relevant based on their summaries and table-of-contents structure - Vector path -- embedding similarity search over the same tree nodes, with structure-aware score propagation (TreeExpansion) Results from both paths are fused, deduplicated, and trimmed to fit within a token budget you specify. When you POST a document, OpenFable: - Semantic chunking -- an LLM identifies discourse boundaries and splits the text into coherent chunks (not fixed-size windows) - Tree construction -- chunks are organized into a hierarchical tree. The LLM generates summaries for internal nodes, creating a table-of-contents-like structure - Multi-granularity embedding -- every node (root, section, subsection, leaf) gets a BGE-M3 embedding. Internal nodes embed their toc_path + summary ; leaves embed their raw content - Indexing -- embeddings are stored in pgvector with HNSW indexes for fast similarity search When you POST a query with a token_budget : Document level -- which documents matter? - LLMselect: the LLM sees shallow tree nodes (toc paths + summaries) and scores document relevance - Vector top-K: cosine similarity search over internal node embeddings, aggregated to document level - Results are fused (union, max-score) Budget routing -- if the fused documents fit within your token budget, their full content is returned. If not, retrieval drills down to node level. Node level -- which chunks matter? - LLMnavigate: the LLM sees the full tree hierarchy and selects relevant subtree roots - TreeExpansion: structure-aware scoring using S(v) = 1/3[S_sim + S_inh + S_child] -- similarity with depth decay, ancestor inheritance, and child aggregation propagate relevance through tree edges - Results are fused with LLM-guided nodes getting priority, then greedily selected up to the token budget The result: you get the most relevant chunks, in document order, within your token budget -- using both LLM reasoning and structural context, not just embedding distance. flowchart LR client([Developer / RAG App]) api["OpenFable APIFastAPI + Python 3.12"] db["PostgreSQL 17+ pgvector"] embeddings["EmbeddingsTEI / OpenAI"] llm["LLM ProviderAnthropic / OpenAI / Ollama"] client -- "REST /v1/api" --> api client -- "MCP /v1/mcp" --> api api -- "SQLAlchemy" --> db api -- "/v1/embeddings" --> embeddings api -- "LiteLLM" --> llm All settings are controlled by environment variables (no .env file). Set your LLM provider's API key directly — OpenFable uses LiteLLM and reads the standard provider variables: | Provider | Environment variable | Model example | |---|---|-

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기