에이전트 VCR – LLM 에이전트를 위한 시간 이동 디버깅(되감기, 상태 편집, 재개)

hackernews | | 📦 오픈소스
#프레임워크
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

Agent VCR는 LLM 에이전트의 실행 과정을 기록하고 특정 단계로 이동하여 상태를 수정한 뒤 다시 시작할 수 있는 시간 여행 디버깅 도구입니다. 이를 통해 중간에 실패한 작업을 처음부터 다시 실행하지 않고도 수정하여 비용을 100% 절감할 수 있습니다. 또한 실시간 AST 분석을 통해 중복 함수나 복잡성 급증을 잡아내고 에이전트가 스스로 교정하도록 돕습니다.

본문

Record · Rewind · Edit · Resume — without re-running anything. 📖 Docs · 🚀 Examples · 🛡️ Sentinel · 📊 Benchmarks pip install ai-agent-vcr No API keys. No cloud. Runs entirely locally. | ❌ Without Agent VCR | ✅ With Agent VCR player = VCRPlayer.load("run.vcr") # Jump to step 8, see what went wrong state = player.goto_frame(7) # Fix it and resume — skip steps 0-7 player.resume(agent, ResumeConfig( from_frame=7, state_overrides={"prompt": "fixed"} )) | | Jump to any step. Full state snapshot at every node. Inspect input, output, diffs. | Fix a prompt, patch a tool output, inject context — then resume from that point. No re-runs. | Fork from any frame. Create parallel runs. Compare how fixes change downstream behavior. | | Save successful runs. Replay the same task instantly — zero tokens, zero cost, 100% savings. | | Real-time AST analysis catches duplicate functions, complexity spikes, and makes the agent self-correct. | | | | P99 under 5ms. Benchmarked in CI on every commit. Safe for production. | from agent_vcr import VCRRecorder recorder = VCRRecorder() recorder.start_session("my_run") # Your existing agent code — unchanged state = {"query": "build a REST API"} state = planner(state) # step 1 recorder.record_step("planner", input_state, state) state = coder(state) # step 2 recorder.record_step("coder", input_state, state) recorder.save() # → .vcr/my_run.vcr Or use the context manager — never lose frames even if the agent crashes: with VCRRecorder() as recorder: recorder.start_session("my_run") # ... your agent code ... # auto-saved on exit from agent_vcr import VCRPlayer from agent_vcr.models import ResumeConfig player = VCRPlayer.load(".vcr/my_run.vcr") # Inspect any step print(player.goto_frame(0)) # {'query': 'build a REST API', ...} print(player.goto_frame(1)) # {'plan': '...', 'steps': [...], ...} print(player.get_errors()) # see what failed # Diff two frames diff = player.compare_frames(0, 1) # {'added': {'plan': ...}, 'modified': {'query': ...}, ...} # Fix and resume from step 1 with a different plan player.resume( agent_callable=coder, config=ResumeConfig( from_frame=1, state_overrides={"plan": "use FastAPI instead of Flask"} ) ) from langgraph.graph import StateGraph from agent_vcr import VCRRecorder from agent_vcr.integrations.langgraph import VCRLangGraph graph = StateGraph(MyState) graph.add_node("planner", planner_node) graph.add_node("coder", coder_node) graph.add_edge("planner", "coder") recorder = VCRRecorder() graph = VCRLangGraph(recorder).wrap_graph(graph) # one line result = graph.invoke({"query": "Build a todo app"}) recorder.save() from crewai import Crew from agent_vcr import VCRRecorder from agent_vcr.integrations.crewai import VCRCrewAI recorder = VCRRecorder() recorder.start_session("crew_run") crew = Crew(agents=[researcher, writer], tasks=[task1, task2]) result = VCRCrewAI(recorder).kickoff(crew) recorder.save() Install extras: pip install "ai-agent-vcr[crewai]" pip install "ai-agent-vcr[langgraph]" from agent_vcr import VCRRecorder from agent_vcr.integrations.langgraph import vcr_record recorder = VCRRecorder() @vcr_record(recorder, node_name="research_step") def research(state: dict) -> dict: return {"findings": search(state["query"])} Databases solved the partial-failure problem 40 years ago. Agents have the exact same problem — when your agent fails mid-run, you don't just have bad in-memory state. You have files written to disk that shouldn't exist. Current tools only roll back state objects. The filesystem stays polluted. Agent VCR wraps agent execution in real transactional semantics: from agent_vcr import VCRRecorder from agent_vcr.integrations.openhands import ACIDWorkspace recorder = VCRRecorder() acid = ACIDWorkspace("/my/workspace", recorder=recorder) acid.begin(session_id="task-001") # isolated git branch acid.savepoint(state, node_name="coder") # checkpoint state + filesystem acid.savepoint(state, node_name="tester") # Agent writes bad code at step 4 — rollback acid.rollback(to_frame_index=1) # git reset --hard → bad files are GONE from disk, not just hidden acid.commit() # merge clean branch into main - BEGIN → isolated git branch per agent session. Parallel agents can't clobber each other. - SAVEPOINT → checkpoints both VCR state AND filesystem. Every frame has a matching git commit. - ROLLBACK → git reset --hard . Files your agent hallucinated are physically deleted. - COMMIT → clean merge back into main. python examples/acid_golden_run.py When your agent succeeds, save the entire execution as a replayable ghost run. Next time you hit the same task, replay it instantly — zero LLM calls, zero tokens, zero cost. from agent_vcr.golden_cache import GoldenRunCache cache = GoldenRunCache() # After a successful run: cache.save_golden_run("Build a REST API with JWT auth", recorder) # Next time — instant, $0.00: outputs, ledger = cache.replay("Build a REST API with JWT auth") print(ledger) # CostLedger(saved=100% | $0.0123 | 4,100 tokens | 2,349ms) The CostLedger tracks original vs replay: tokens, dollars, milliseconds, and reduction percentage. The demo shows it live: python examples/acid_golden_run.py RUN 1: Original RUN 2: Ghost Replay Tokens: 4,100 Tokens: 0 Cost: $0.0123 Cost: $0.00 Latency: 2,350ms Latency: 1ms 💰 Savings: 100% · $0.0123 · 4,100 tokens · 2,349ms Run the terminal debugger on any recorded session: vcr-tui .vcr/my_run.vcr ┌──────────────────────────────────────────────────────────┐ │ 📼 Agent VCR TUI Session: my_run · 8 frames │ ├──────────────────────────────────────────────────────────┤ │ ▶ Frame 0 │ planner │ 100ms │ ● │ │ Frame 1 │ researcher │ 250ms │ ● │ │ Frame 2 │ coder │ 480ms │ ✗ ERROR │ │ Frame 3 │ tester │ 80ms │ ● │ ├──────────────────────────────────────────────────────────┤ │ State at frame 0: │ │ { "query": "build a todo app", │ │ "context": "...", │ │ "plan": null } │ ├──────────────────────────────────────────────────────────┤ │ ← → navigate │ e edit │ d diff │ r resume │ q quit │ └──────────────────────────────────────────────────────────┘ Keybindings: ← → — navigate framese — edit state inline (opens editor, saves on exit)d — diff current frame vs previousr — resume from current framef — fork current frame to new sessionq — quit See your agent's full execution graph — forks, parallel branches, error paths: vcr-server .vcr/ # Open localhost:8000 The dashboard renders your session as a DAG: original_run ────────────────────────────────────────────► [done] │ frame 3 ╰──► fork_v1 ──► [coder] ──► [tester] ──► [done] │ ╰──► fork_v2 ──► [coder] ──► [done] - Every fork is a branch node - Error frames shown in red - Click any node to inspect full state - Live WebSocket streaming for in-progress sessions "Code is cheap now. Good code is not." — Graham Neubig, OpenHands Chief Scientist Sentinel watches every file an AI agent writes and catches quality violations in real time — before the agent moves on. from openhands_sentinel import Sentinel from agent_vcr import VCRRecorder recorder = VCRRecorder() sentinel = Sentinel(recorder=recorder) sentinel.attach(runtime.event_stream) # 3 lines, auto-intercepts every file write python examples/sentinel_demo.py STEP 1: Agent writes auth/utils.py 🛡️ SENTINEL: auth/utils.py — CLEAN ✓ STEP 2: Agent writes handlers.py 🛡️ SENTINEL: VIOLATIONS DETECTED! CRITICAL hash_password() already exists in auth/utils.py:8 — reuse it CRITICAL handle_auth_request() is 109 lines (max 40) — break it up CRITICAL Cyclomatic complexity 32 (max 8) — simplify WARNING 9 parameters (max 5) — use a config object STEP 3: Agent self-corrects 🛡️ SENTINEL: handlers.py — CLEAN ✓ All issues resolved! 📼 Audit trail: .vcr/sentinel-demo.vcr Or scan any directory standalone: sentinel scan ./my-ai-project | Without Sentinel | With Sentinel | |---|---| | Agent writes bad code | Agent writes bad code | | Human reviews PR | Sentinel catches in Path recorder.fork(from_frame=3) -> VCRRecorder # branch from a frame # Context manager — auto-saves on exit with VCRRecorder() as r: r.start_session("run") ... player = VCRPlayer.load(".vcr/my_run.vcr") player = VCRPlayer.load_by_id("my_run") player.goto_frame(index) # → dict (output state at frame N) player.get_frame(index) # → Frame object player.get_input_state(index) # → dict (input state at frame N) player.list_nodes() # → ['planner', 'coder', ...] player.get_errors() # → [Frame, ...] player.compare_frames(a, b) # → {'added': {}, 'removed': {}, 'modified': {}} player.get_total_latency() # → float (ms) player.get_total_tokens() # → int player.get_total_cost() # → float (USD) player.resume( agent_callable, # your agent function config=ResumeConfig( from_frame=7, # rewind to BEFORE step 7 ran state_overrides={"k": "v"},# apply these before re-running mode=ResumeMode.FORK, # FORK | REPLAY | MOCK ) ) -> str # new session ID acid = ACIDWorkspace("/workspace", recorder=recorder) acid.begin(session_id="task-001") acid.savepoint(state, node_name="coder") acid.rollback(to_frame_index=2) # git reset --hard acid.commit() # merge to main from agent_vcr.golden_cache import GoldenRunCache cache = GoldenRunCache(cache_dir=".vcr/golden") cache.save_golden_run(task_description, recorder) -> str # fingerprint cache.replay(task_description) -> (outputs, CostLedger) cache.invalidate(task_description) -> bool cache.list_runs() -> list[dict] # Basic recording and playback python examples/basic_usage.py # Time-travel: rewind, edit state, resume (with assertion) python examples/time_travel_demo.py # LangGraph auto-instrumentation python examples/langgraph_integration.py # ACID transactions + Ghost Replay (most impressive demo) python examples/acid_golden_run.py # OpenHands Sentinel: agent self-correction live python examples/sentinel_demo.py # Async recording python examples/async_example.py Sessions are plain JSONL — one JSON object per line: {"type": "session", "data": {"session_id": "my_run", "created_at": "2024-01-01T00:00:00Z", ...}} {"type": "frame", "data": {"node_name": "planner", "input_state": {...}, "output_state": {...}, "metadata": {"latency_ms": 120}}} {"type": "frame", "data": {"node_name": "coder", ...}} - Human-readable — open in any text editor - Git-diffable — review agent state changes in PRs - Append-only — no rewrites, safe for concurrent agents - Streamable — parse line-by-line, no full-file load required Recording overhead is benchmarked in CI on every commit and must stay under 5ms P99. pytest tests/benchmarks/ -v --benchmark-only Results are published at ixchio.github.io/agent-vcr/dev/bench/. - Core recording and playback - Time-travel resume with state injection - FastAPI server with live WebSocket streaming - LangGraph integration - CrewAI integration - Async recorder and player - Terminal TUI debugger ( vcr-tui ) - React dashboard with DAG visualization - ACID Transactions (git-backed filesystem rollback) - Ghost Replay (zero-cost replay of successful runs) - 🛡️ OpenHands Sentinel (real-time code quality guardian) - Context manager ( with VCRRecorder() as r: ) - AutoGen integration - Cloud storage backend (S3, GCS) - Collaborative debugging (share sessions) - Replay regression tests (run golden paths as CI assertions) git clone https://github.com/ixchio/agent-vcr.git cd agent-vcr pip install -e ".[dev,tui]" pytest tests/unit/ -v See CONTRIBUTING.md for guidelines. MIT — see LICENSE.

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →