Litmus – AI 에이전트를 위한 비행 기록 장치(모든 LLM 실행 기록 및 재생)
hackernews
|
|
📦 오픈소스
#ai 딜
#ai 에이전트
#anthropic
#claude
#gemini
#llama
#llm
#mistral
#openai
#perplexity
#실행 기록
#재생
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
AI 에이전트의 실행 과정을 기록하고 결정적으로 재생할 수 있는 도구인 'Litmus'가 소개되었습니다. 이 도구는 14개 이상의 LLM 제공업체와 호환되며, OpenAI 및 Anthropic SDK의 HTTP 전송 계층을 런타임에 패치하여 모든 API 호출을 가로챕니다. 실제 API를 호출하지 않고도 기록된 추적 파일을 통해 동일한 결과를 재생할 수 있어 생산 환경의 버그를 효율적으로 디버깅할 수 있습니다. 또한, LLM 거부, 타임아웃, 500 에러 등 다양한 장애 상황을 주입하여 에이전트의 복원력을 테스트하고, 안정성 점수를 기반으로 CI 배포를 제어하는 기능을 제공합니다.
본문
Record and deterministically replay AI agent executions. Litmus captures every LLM and tool call your agent makes, saving structured trace files you can inspect, share, and replay. pip install litmus-trace # Record your agent (wraps the process, captures all LLM calls) litmus run python my_agent.py # View the trace litmus view ./traces/lt-abc123.trace.json Your agent code stays completely unchanged. Litmus patches the SDK transport layer at runtime. Record — Intercepts every HTTP call to LLM APIs (Anthropic, OpenAI, Mistral, 14+ providers). Saves the full request and response as a trace file. API keys are automatically redacted. View — Pretty-print traces with step-by-step details, latency, and model info. Replay — Feed recorded responses back to your agent. Same code path, same output, no real API calls. Fault Injection — Mutate recorded responses to test resilience. What happens when Claude refuses? When GPT returns a 500? When the API times out? CI Gating — Score your trace corpus for reliability and block deploys that drop below a threshold. Join the Discord to get notified when these features launch. litmus run python my_agent.py import litmus litmus.record() # ... your existing agent code, unchanged ... litmus.stop() litmus proxy --mode record # Then point your SDK: ANTHROPIC_BASE_URL=http://localhost:8787/anthropic python my_agent.py Works with any LLM API out of the box: | Provider | Status | |---|---| | Anthropic (Claude) | Tested | | OpenAI (GPT) | Tested | | Google (Gemini) | Supported | | Mistral | Supported | | Cohere | Supported | | Groq | Supported | | Together AI | Supported | | Fireworks AI | Supported | | DeepSeek | Supported | | Perplexity | Supported | | OpenRouter | Supported | | Ollama (local) | Supported | | vLLM (local) | Supported | | LM Studio (local) | Supported | Custom/self-hosted models: litmus proxy --provider my-model=https://my-finetuned-llama.example.com/v1 litmus run Wrap a command to record (zero code changes) litmus view Pretty-print a trace file litmus proxy Start the recording proxy server litmus providers List all supported providers litmus replay Replay a trace (coming soon — requires Litmus Cloud) litmus ci Score traces and gate deploys (coming soon — requires Litmus Cloud) Litmus monkey-patches the httpx transport layer used by both Anthropic and OpenAI Python SDKs. When you call client.messages.create(...) , Litmus intercepts the HTTP request before it leaves your machine. Record mode: The real API call goes through. Litmus captures the request and response, then saves them to a trace file. API keys are automatically redacted. Replay mode: The real API is never called. Litmus serves the recorded response directly from the trace file. Your agent gets the exact same response it got during recording — same tool calls, same content, same stop reason. - API keys ( Authorization ,x-api-key ) are automatically redacted from trace headers - Use --compact to strip request bodies for smaller trace files - Note: message content in request/response bodies is NOT redacted — don't include secrets in your prompts - Python only — the monkey-patch approach ( litmus run ,litmus.record() ) requires Python. Use proxy mode for other languages. - httpx-based SDKs — works with SDKs that use httpx under the hood (Anthropic, OpenAI, Mistral, Cohere, etc). SDKs usingrequests oraiohttp are not intercepted. - Sequential replay — responses are served in recorded order. Agents that make calls in a different order on replay will get mismatched responses. - No tool call recording — only LLM API calls are captured. External tool calls (database, HTTP APIs) are not recorded. - Discord — fastest way to get help, share traces, and request features - GitHub Issues — bug reports and feature requests - PyPI — package I'm building Litmus in the open and I want to hear from you — whether it's a bug, a feature idea, or just telling me about your agent setup. I personally respond to everything. - Email: [email protected] - Discord: romirj (join the server) - Twitter/X: @romir_jain If you're running agents in production and want to use Litmus, I'll personally help you set it up. DM me anywhere. Observability tools (LangSmith, Langfuse) tell you what happened. They log traces. Litmus captures the full picture. Every LLM call, every response, every token — in a structured trace file you can inspect, share, and (soon) replay deterministically with fault injection. LangSmith is the dashcam. Litmus is building the crash test facility. MIT
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유