Chonkify – LLMLingua보다 최대 4배 더 뛰어난 RAG 및 에이전트 압축

hackernews | 2026년 3월 26일 18:10 | 📦 오픈소스

#llmlingua #openai #rag #문서 압축 #에이전트 #추출적 요약 #하드웨어/반도체

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

RAG 및 에이전트 워크플로우를 위해 개발된 'chonkify'는 토큰 예산 내에서 중요한 사실을 보존하는 데 초점을 맞춘 문서 압축 도구입니다. 벤치마크 결과에 따르면, 특히 팩트 중심의 양적 연구 및 추론 문서에서 마이크로소프트의 LLMLingua 계열 대비 약 4배 이상의 압도적인 사실 복원률을 기록했습니다. 또한 전체 테스트 데이터에서 소스 토큰의 약 75.2%를 절약하며, 다양한 운영체제와 로컬 및 OpenAI 임베딩 환경을 모두 지원합니다.

본문

Extractive document compression that actually preserves what matters. chonkify compresses long documents into tight, information-dense context for RAG pipelines, agent memory, and any workflow where token budget matters as much as factual recovery. This release focuses on strong factual recovery under hard token budgets across general txt /md and fact-heavy document workloads. Today, the clearest validated fit is content-dense non-PDF text: quantitative research digests, structured engineering notes, and reasoning traces where downstream models need exact facts more than fluent paraphrase. It remains a general-purpose document compressor, but this is the workload family where the current release is strongest. By Thomas "Thom" Heinrich · chonkyDB.com Most compression tools optimize for token reduction. chonkify optimizes for information recovery — the compressed output retains the facts, structure, and reasoning that downstream models actually need. On the current release corridors against Microsoft's LLMLingua family: | Suite | chonkify | LLMLingua | LLMLingua2 | |---|---|---|---| general txt/md (20 cases), fact_recall_mean | 0.8833 | 1.0000 | 0.8667 | general txt/md , budget_ok_rate | 1.0000 | 0.0000 | 0.3500 | fact-heavy quant/reasoning (22 cases), fact_recall_mean | 0.5606 | 0.1061 | 0.1212 | fact-heavy quant/reasoning, budget_ok_rate | 1.0000 | 0.2727 | 0.1364 | Across both suites combined, chonkify currently saves 75.20% of source tokens, versus 62.95% for LLMLingua and 62.76% for LLMLingua2 . Full methodology and caveats are in BENCHMARKS.md. chonkify builds source-faithful document units, scores them through a strict 768 -dimensional embedding interface, and returns a compact output that respects your token budget. Performance-sensitive implementation ships as compiled extension modules. This refreshed handoff includes the current native cp311 wheel matrix for the supported desktop/server targets: # Linux x86_64 pip install ./chonkify-0.3.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl # Windows amd64 py -3.11 -m pip install .\chonkify-0.3.0-cp311-cp311-win_amd64.whl # macOS arm64 python3.11 -m pip install ./chonkify-0.3.0-cp311-cp311-macosx_11_0_arm64.whl # macOS x86_64 python3.11 -m pip install ./chonkify-0.3.0-cp311-cp311-macosx_10_9_x86_64.whl These four wheels were produced by the native GitHub Actions matrix run 23559149680 , and the Linux manylinux artifact was revalidated afterwards with a fresh-venv ci/wheel_smoke.py install smoke. For local CPU/GPU embeddings (no API calls), also install: pip install sentence-transformers Or use the optional extra: pip install chonkify[local] chonkify compress ./paper.pdf \ --target-tokens 1200 \ --output ./paper_compressed.txt \ --metadata-out ./paper_meta.json Multiple documents in one pass: chonkify compress ./brief.md ./appendix.pdf \ --target-tokens 1400 \ --output ./bundle.txt Pipe from stdin: cat ./notes.txt | chonkify compress - --target-tokens 900 --output - from chonkify import compress_documents # With additional control over embedding providers: from chonkify import ( LocalEmbeddingConfig, LocalSentenceTransformerEmbeddingProvider, OpenAIEmbeddingConfig, OpenAIEmbeddingProvider, compress_documents, ) Minimal example: from chonkify import compress_documents result = compress_documents( ["Quarterly revenue rose 18%. Operating margin expanded to 27%. Guidance remains unchanged."], target_tokens=24, ) print(result.compressed_text) print(result.compressed_tokens) export AZURE_OPENAI_ENDPOINT="https://.openai.azure.com/" export AZURE_OPENAI_API_KEY="" export AZURE_OPENAI_API_VERSION="2024-10-21" export CHONKIFY_AZURE_EMBEDDING_DEPLOYMENT="" export OPENAI_API_KEY="" export CHONKIFY_OPENAI_EMBEDDING_MODEL="text-embedding-3-large" chonkify compress ./paper.pdf --embedding-backend openai --target-tokens 1200 For providers like Together, Fireworks, or self-hosted APIs: export OPENAI_API_KEY="" export CHONKIFY_OPENAI_BASE_URL="https:///v1" export CHONKIFY_OPENAI_EMBEDDING_MODEL="" chonkify compress ./paper.pdf --embedding-backend openai-compatible --target-tokens 1200 If your endpoint rejects the dimensions parameter, add --openai-omit-dimensions-parameter . chonkify still validates 768-dimensional output. Fully offline after first model download. Default model: sentence-transformers/all-mpnet-base-v2 . chonkify compress ./paper.pdf \ --embedding-backend local \ --local-device cuda \ --target-tokens 1200 Device options: cpu , cuda , cuda:0 , mps . Validated with sentence-transformers 5.1.0 and torch 2.8.0+cu128 on NVIDIA RTX 3090. Cold-cache run: ~13s. Warm-cache run: ~6s. Model footprint: ~419 MB. With HF_HUB_OFFLINE=1 , the local backend runs fully air-gapped once cached. The optional --metadata-out JSON includes: - Original and compressed token counts - Compression factor and token reduction percentage - Selected source blocks with source IDs and ranks - Embedding provider used If you pass --query , it is preserved in m

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기