새로운 OllamaMQ 버전 v0.2.5

hackernews | | 📦 오픈소스
#ai 딜 #api #claude #llama #ollama #openai #로드밸런싱 #메시지큐 #비동기
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

OllamaMQ는 여러 Ollama 인스턴스 앞에 배치되어 들어오는 요청을 병렬로 분배하는 고성능 비동기 메시지 큐 디스패처 및 로드 밸런서입니다. 최소 연결(Least Connections) 기반의 라운드 로빈 스케줄링과 사용자별 큐(Queue)를 통해 공정한 자원 분배를 보장하며, 10초 단위의 헬스 체크로 백엔드 상태를 자동 모니터링합니다. 또한 실시간 TUI 대시보드를 통해 VIP 및 Boost 모드 설정, 사용자 차단 등 세부 트래픽 제어가 가능하고 OpenAI 호환 엔드포인트 및 Docker 컨테이너 실행을 완벽하게 지원합니다.

본문

ollamaMQ is a high-performance, asynchronous message queue dispatcher and load balancer designed to sit in front of one or more Ollama API instances. It acts as a smart proxy that queues incoming requests from multiple users and dispatches them in parallel to multiple Ollama backends using a fair-share round-robin scheduler with least-connections load balancing. - Multi-Backend Load Balancing: Distribute requests across multiple Ollama instances using a Least Connections + Round Robin strategy. - Parallel Processing: Unlike basic proxies, ollamaMQ can process multiple requests simultaneously (one per available backend), significantly increasing throughput for multiple users. - Backend Health Checks: Automatically monitors backend status every 10 seconds; offline instances are temporarily skipped and marked in the TUI. - Per-User Queuing: Each user (identified by the X-User-ID header) has their own FIFO queue. - Fair-Share Scheduling: Prevents any single user from monopolizing all available backends. - Transparent Header Forwarding: Full support for all HTTP headers (including X-User-ID ) passed to and from Ollama, ensuring compatibility with tools like Claude Code. - VIP & Boost Modes: Absolute priority (VIP) or increased frequency (Boost) for specific users. - Real-Time TUI Dashboard: Monitor backend health, active requests, queue depths, and throughput in real-time. - OpenAI Compatibility: Supports standard OpenAI-compatible endpoints. - Async Architecture: Built on tokio andaxum for high concurrency. Ensure you have Rust (2024 edition or later) and Ollama installed. cargo install ollamaMQ - Clone the repository: git clone https://github.com/Chleba/ollamaMQ.git cd ollamaMQ - Build and install locally: cargo install --path . - Ensure Docker and Docker Compose are installed. - Start your local Ollama instance (defaulting to localhost:11434 ). - Run: docker compose up -d First build the image from the local Dockerfile: docker build -t chlebon/ollamamq . Then run the container: docker run -d \ --name ollamamq \ -p 11435:11435 \ --restart unless-stopped \ chlebon/ollamamq ollamaMQ supports several options to configure the proxy: -p, --port : Port to listen on (default:11435 )-o, --ollama-urls : Comma-separated list of Ollama server URLs (default:http://localhost:11434 )-t, --timeout : Request timeout in seconds (default:300 )--no-tui : Disable the interactive TUI dashboard (useful for Docker/CI)--allow-all-routes : Enable fallback proxy for non-standard endpoints-h, --help : Print help message-V, --version : Print version information Example: ollamaMQ --port 8080 --ollama-urls http://10.0.0.1:11434,http://10.0.0.2:11434 --timeout 600 Docker Example: docker run -d \ --name ollamamq \ -p 8080:8080 \ chlebon/ollamamq --port 8080 --ollama-urls http://192.168.1.5:11434 --timeout 600 Point your LLM clients to the ollamaMQ port (11435 ) and include the X-User-ID header. GET /health (Internal health check)GET / (Ollama Status)POST /api/generate POST /api/chat POST /api/embed POST /api/embeddings GET /api/tags POST /api/show POST /api/create POST /api/copy DELETE /api/delete POST /api/pull POST /api/push GET/HEAD/POST /api/blobs/{digest} GET /api/ps GET /api/version POST /v1/chat/completions (OpenAI Compatible)POST /v1/completions (OpenAI Compatible)POST /v1/embeddings (OpenAI Compatible)GET /v1/models (OpenAI Compatible)GET /v1/models/{model} (OpenAI Compatible) curl -X POST http://localhost:11435/api/chat \ -H "X-User-ID: developer-1" \ -d '{ "model": "qwen3.5:35b", "messages": [{"role": "user", "content": "Explain quantum computing."}], "stream": true }' The interactive TUI dashboard provides a live view of the dispatcher's state: j /k or Arrows: Navigate the user/blocked list.Tab orh /l : Switch between the Users and Blocked panels.p : Toggle VIP status for the selected user (absolute priority).b : Toggle Boost status for the selected user (prioritizes every 5th request).x : Block the selected user.X : Block the selected user's IP address.u : Unblock the selected user or IP (works in both panels).q or Esc: Exit the dashboard and stop the application.? : Toggle detailed help. Visual Indicators: ★ (Magenta): VIP User (absolute priority).⚡ (Yellow): Boosted User (every 5th request priority).▶ (Cyan): Request is currently being processed/streamed.● (Green): User has requests waiting in the queue.○ (Gray): User is idle (no active or queued requests).✖ (Red): User or IP is blocked. Logs are automatically written to ollamamq.log in the current working directory. This keeps the terminal clear for the TUI dashboard while allowing you to monitor system events and debug backend communication. The included docker-compose.yml provides a ready-to-use configuration: services: ollamamq: build: . image: chlebon/ollamamq:latest container_name: ollamamq ports: - "11435:11435" environment: - OLLAMA_URLS=http://host.docker.internal:11434 - PORT=11435 extra_hosts: - "host.docker.internal:host-gateway" restart: unless-stopped Note for Li

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →