LLM 라우터 – Claude Code 작업을 저렴한 모델로 라우팅하는 MCP 서버
hackernews
|
|
📦 오픈소스
#ai 딜
#ai 라우팅
#anthropic
#claude
#gemini
#gpt-4
#llama
#llm
#mcp
#perplexity
#비용 최적화
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
LLM Router는 단일 MCP 서버 환경에서 텍스트, 이미지, 영상 등 작업의 특성에 맞춰 20여 개 이상의 AI 모델 중 최적의 모델로 자동 라우팅해 주는 도구입니다. 단순 작업에는 무료에 가까운 모델을, 복잡한 작업에는 고성능 모델을 배정하여 기존 대비 약 70~85%의 API 비용을 절감할 수 있습니다. 사용자는 설정된 예산과 목표에 따라 자동으로 분류된 라우팅 결과를 웹 대시보드를 통해 실시간으로 모니터링할 수 있습니다.
본문
One MCP server. Every AI model. Smart routing. Route text, image, video, and audio tasks to 20+ AI providers — automatically picking the best model for the job based on your budget and active profile. Quick Start • How It Works • Providers • Tools • Configuration • Provider Setup You use Claude Code. You also have GPT-4o, Gemini, Perplexity, DALL-E, Runway, ElevenLabs — but switching between them is manual, slow, and expensive. LLM Router gives your AI assistant one unified interface to all of them — and automatically picks the right one based on what you're doing and what you can afford. You: "Research the latest AI funding rounds" Router: → Perplexity Sonar Pro (search-augmented, best for current facts) You: "Generate a hero image for the landing page" Router: → Flux Pro via fal.ai (best quality/cost for images) You: "Write unit tests for the auth module" Router: → Claude Sonnet (top coding model, within budget) You: "Create a 5-second product demo clip" Router: → Kling 2.0 via fal.ai (best value for short video) Not every task needs the same model. Without a router, everything goes to the same expensive model — like hiring a surgeon to change a lightbulb. "What does os.path.join do?" → Gemini Flash ($0.000001 — literally free) "Refactor the auth module" → Claude Sonnet ($0.003) "Design the full system arch" → Claude Opus ($0.015) | Task type | Without Router | With Router | Savings | |---|---|---|---| | Simple queries (60% of work) | Opus — $0.015 | Haiku/Gemini Flash — $0.0001 | 99% | | Moderate tasks (30% of work) | Opus — $0.015 | Sonnet — $0.003 | 80% | | Complex tasks (10% of work) | Opus — $0.015 | Opus — $0.015 | 0% | | Blended monthly estimate | ~$50/mo | ~$8–15/mo | 70–85% | 💡 With Ollama: simple tasks route to a free local model — those 60% of queries cost $0. Zero API keys required — if you have a Claude Code subscription, the router works out of the box. Simple tasks route to Claude Haiku (included), complex ones escalate to Sonnet/Opus. External providers (GPT-4o, Gemini, Perplexity) are optional add-ons. pipx install claude-code-llm-router && llm-router install Or with pip: pip install claude-code-llm-router && llm-router install claude plugin add ypollak2/llm-router git clone https://github.com/ypollak2/llm-router.git cd llm-router uv sync LLM Router is an MCP server — it works in any IDE that supports the Model Context Protocol. Cursor — add to ~/.cursor/mcp.json : { "mcpServers": { "llm-router": { "command": "llm-router", "args": [] } } } Windsurf — add to ~/.codeium/windsurf/mcp_config.json : { "mcpServers": { "llm-router": { "command": "llm-router", "args": [] } } } Zed — add to Zed's settings.json : { "context_servers": { "llm-router": { "command": { "path": "llm-router", "args": [] } } } } The MCP tools ( llm_query ,llm_code ,llm_research , etc.) work identically in all IDEs. The auto-route hook is Claude Code-specific; other IDEs call the tools directly. Make the router evaluate every prompt across all projects: # From the MCP tool: llm_setup(action='install_hooks') # Or from the CLI: llm-router install This installs hooks + rules to ~/.claude/ so every Claude Code session auto-routes tasks to the optimal model. Start for free: Google's Gemini API has a free tier with 1M tokens/day. Groq also offers a generous free tier with ultra-fast inference. - 30 MCP tools — smart routing, text/code, image/video/audio, streaming, orchestration, usage monitoring, web dashboard - Auto-route hook — intercepts every prompt before your top-tier model sees it; heuristic → Ollama → cheap API classifier chain, hooks self-update on pip upgrade - Claude subscription mode — routes entirely within your CC subscription; Codex (free) before paid externals; external only when quota exhausted - Anthropic prompt caching — auto-injects cache_control breakpoints on long system prompts; up to 90% savings on repeated context - Semantic dedup cache — Ollama embeddings + cosine similarity skip identical-intent calls at zero cost - Web dashboard — llm-router dashboard →localhost:7337 ; cost trends, model distribution, recent decisions - Hard spend caps — LLM_ROUTER_DAILY_SPEND_LIMIT andLLM_ROUTER_MONTHLY_BUDGET raise before any call - Prompt classification cache — SHA-256 LRU cache for instant repeat classifications - Circuit breaker + health — catches 429s, marks unhealthy providers, auto-recovers - Quality logging — records every routing decision; llm_quality_report shows accuracy, savings, downshift rate - Cross-platform — macOS, Linux, Windows (desktop notifications, background processes, path handling) The built-in web dashboard (llm_dashboard or llm-router dashboard ) gives you a live view of routing decisions, cost trends, and subscription pressure. | Overview | Performance | |---|---| | Logs & Analysis | |---| Design: Liquid Glass dark theme — Inter + JetBrains Mono, Material Symbols, Tailwind CSS. Auto-refreshes every 30 s. The UserPromptSubmit hook intercepts all prompts before your top-tier model sees them.
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유