AI용 오픈소스 Sentry인 Rubric을 구축했습니다. 베타 테스터를 찾고 있습니다

hackernews | | 📦 오픈소스
#ai #ai 딜 #anthropic #gpt-4 #llm #openai #rubric #오픈소스 #출력 품질
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

오픈소스 프로젝트 Rubric은 'AI를 위한 Sentry'라는 컨셉으로, 애플리케이션과 LLM 공급자 사이에서 출력 품질을 실시간으로 모니터링하고 점수를 매겨 문제 발생 시 알림을 보냅니다. 복잡한 코드 수정 없이 baseURL을 변경하는 방식으로 1줄 통합이 가능하며, OpenAI, Groq, Anthropic 등 다양한 공급자와 호환됩니다. 응답이 너무 짧거나 관련성이 낮은 등 8가지 차원에서 품질을 평가하고, 대시보드를 통해 문제를 추적하거나 웹훅으로 알림을 설정할 수 있는 MIT 라이선스의 셀프 호스팅 솔루션입니다.

본문

Sentry for AI — LLM output quality monitoring in production. Rubric sits between your app and any LLM provider. It logs every call, scores the output quality automatically, and alerts you when something drifts — before your users notice. Your App → Rubric Proxy → OpenAI / Anthropic / Groq / ... ↓ Quality Score Flag Detection Drift Alerting Dashboard Most LLM monitoring tools require you to instrument your code, set up complex pipelines, or pay for enterprise contracts. Rubric is: - 1-line integration — change baseURL , done - Works with any OpenAI-compatible API — OpenAI, Groq, Together, OpenRouter, local models - Heuristic + LLM-as-judge — fast scoring on every call, deep evaluation sampled at 10% - Open source — MIT license, self-hostable, no vendor lock-in git clone https://github.com/rubric-dev/rubric cd rubric cp .env.example .env # add your ADMIN_SECRET npm install npm run proxy curl -X POST http://localhost:3000/api/keys \ -H "Authorization: Bearer your-admin-secret" \ -H "Content-Type: application/json" \ -d '{"name": "my-project"}' # → {"key": "gk-..."} JavaScript / TypeScript import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, baseURL: "http://localhost:3000/v1", // point to Rubric defaultHeaders: { "x-guard-key": "gk-..." }, // your Rubric key }); // All existing OpenAI calls work unchanged const response = await client.chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: "Summarize this article..." }], }); Python import openai from rubric import openai_config client = openai.OpenAI(**openai_config( guard_key="gk-...", base_url="http://localhost:3000" )) # All existing calls work unchanged response = client.chat.completions.create(...) Groq / Other providers — add x-provider header: defaultHeaders: { "x-guard-key": "gk-...", "x-provider": "groq" // or: openai, anthropic, together, openrouter } npm run dashboard # → http://localhost:3001 Rubric scores every LLM response on 8 dimensions: | Flag | What it catches | Score penalty | |---|---|---| too_short | Response too brief for the prompt complexity | −40% | refusal | Model refused or declined the request | −30% | low_relevance | Output doesn't relate to the prompt | −25% | hallucination_risk | Ungrounded statistics, fake citations, invented data | −20% | format_mismatch | Asked for JSON/markdown but got plain text | −15% | language_mismatch | Response in wrong language | −15% | repetitive | Repeated sentences, trigrams, or word stems | −15% | verbose_padding | Filler phrases, marketing fluff, over-long responses | −10% | Scores range 0.0–1.0. A score below 0.7 indicates a problematic response. Set JUDGE_API_KEY (Anthropic key) in .env to enable deep quality evaluation on 10% of calls. The judge score is blended with heuristics (70/30). The Rubric dashboard gives you the Sentry-style flow: - Problems overview — which quality issues are most common, with counts and percentages - Click a problem → filtered trace list showing only affected calls - Click a trace → full detail: prompt, response, quality analysis, metrics Configure a webhook to get notified when quality drops: curl -X POST http://localhost:3000/api/alerts \ -H "x-guard-key: gk-..." \ -H "Content-Type: application/json" \ -d '{ "threshold": 0.15, "window_hours": 24, "webhook_url": "https://hooks.slack.com/..." }' Works with Slack, Discord, or any HTTP webhook. | Provider | x-provider value | |---|---| | OpenAI | openai (default) | | Groq | groq | | Anthropic | anthropic | | Together AI | together | | OpenRouter | openrouter | docker compose up # Proxy on :3000, Dashboard on :3001 packages/ proxy/ — Hono.js proxy server (Node.js + TypeScript) sdk/ — TypeScript/JavaScript SDK sdk-python/ — Python SDK dashboard/ — Next.js dashboard examples/ — 11 example apps for testing MIT — see LICENSE

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →