Civic-SLM is a domain-specialized fine-tune of Qwen2.5-7B for U.S. govt data
hackernews
|
|
📰 뉴스
#anthropic
#claude
#llama
#openai
#review
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
Civic-SLM은 미국 지방 정부의 PDF 문서를 처리하도록 특화된 Qwen2.5-7b 기반의 파인 튜닝 모델로, 시민 투명성 도구 개발을 위해 설계되었습니다. 단일 애플 실리콘 Mac에서 훈련되었으며 MLX나 GGUF 등 다양한 런타임 환경에서 무료로 배포되어 50개주의 지방 자치 문서 분석에 활용될 수 있습니다.
본문
Civic-SLM Civic-SLM is a domain-specialized fine-tune of Qwen2.5-7B-Instruct for U.S. local-government documents — city, county, and township agendas, staff reports, comprehensive plans, minutes, ordinances, and municipal codes. Designed to power civic transparency tools across all 50 states. Trained on a single Apple Silicon Mac via MLX-LM. Served on whatever runtime you like — MLX, Ollama, LM Studio, llama.cpp, or any OpenAI-compatible endpoint. Released as both MLX-q4 and GGUF Q5_K_M. Documents are crawled with browser-use — one small recipe per jurisdiction. Why Local government is where most public decisions actually get made, and the documents that drive those decisions — agendas, staff reports, minutes, ordinances — are mostly PDFs buried on legacy CMSes. General-purpose LLMs can read them, but they hallucinate specifics, miss citations, and don’t know the genre. Civic-SLM is a small, open, auditable model trained specifically on this corpus so it can ground answers in the source text, extract structured data from staff reports, and refuse when the context doesn’t support an answer. Pipeline - Crawl — one browser-use recipe per jurisdiction (San Clemente, CA ships as the demo; recipes are tiny and composable for any U.S. city, county, or township). - Chunk — Pydantic-validated DocumentChunk schemas with provenance. - Synthesize — generate training pairs via the Anthropic SDK or a fully-local LLM backend (env-switchable). - Train — continued pre-training (CPT), supervised fine-tuning (SFT), and direct preference optimization (DPO) on MLX. - Merge & quantize — final adapter merged and quantized to MLX-q4 and GGUF Q5_K_M. - Eval — every stage reported to W&B and compared against the committed baselines. Eval-first The training contract is no training without a baseline. Four benchmarks run against base Qwen2.5-7B before any fine-tuning starts; those numbers are what every subsequent stage has to beat. | Bench | What it measures | Score | |---|---|---| civic_factuality | Q&A grounded in held-out docs | citation exact-match + word-overlap | refusal | refuses when context lacks the answer | refusal rate (regex + fallback judge) | structured_extraction | staff report → JSON | field-level F1 | side_by_side | open-ended municipal prompts vs base 7B and 72B | Claude or local-LLM judge with A/B position swap | Baseline numbers (Qwen2.5-7B-Instruct 4-bit, MLX) | Bench | n | Mean | Median | Latency | |---|---|---|---|---| | factuality | 10 | 0.501 | 0.566 | 637 ms | | refusal | 10 | 0.800 | 1.000 | 460 ms | | extraction | 5 | 0.277 | 0.000 | 925 ms | | side_by_side | — | — (pending 72B comparator) | — | — | Quickstart uv sync --all-extras uv run pytest # 42 tests across schema, ingest, scorers, synth, train, llm-backend uv run civic-slm --help The civic-slm umbrella CLI exposes every stage: doctor , crawl , eval run , eval side-by-side , and train cpt|sft|dpo . See the repo’s docs/USAGE.md for an end-to-end walkthrough and docs/RECIPES.md to add a new jurisdiction. Status Scaffold, schemas, ingestion (browser-use + San Clemente demo recipe + a template for any U.S. jurisdiction), 4-bench eval harness, synth pipeline (Anthropic or fully-local backend), MLX training scripts (CPT/SFT/DPO), merge + quantize to MLX-q4 and GGUF Q5_K_M, runtime-agnostic serving, and committed baselines for factuality, refusal, and extraction are all in place. Next up: synth corpus and the first training pass.
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유