Show HN: Tarmac – Claude Code를 실행하기 전에 비용이 얼마나 드는지 알아보세요.
hackernews
|
|
🔬 연구
#ai 코딩
#claude
#claude code
#review
#show hn
#개발 도구
#비용 추정
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
Tarmac은 Claude Code를 사용할 때 발생하는 예상치 못한 과금 청구를 막기 위해 개발된 도구로, 실행 전에 프롬프트를 분석해 비용 범위를 추정합니다. 이 도구는 SWE-bench의 3,000개 과제를 바탕으로 훈련된 예측 모델을 활용하여 약 81%의 정확도로 비용을 예측하며, 완전히 오픈 소스이고 사용자의 데이터를 추적하지 않습니다.
본문
Pre-flight cost estimation for Claude Code. Know what your AI coding task will cost before it runs. Tarmac hooks into Claude Code, intercepts your prompt, and shows a calibrated cost range — so you can proceed, switch models, or cancel before spending a cent. Claude Code has zero cost visibility. You type a prompt, it runs for 2 minutes or 20 minutes, and you find out the cost after it's done. For complex tasks on Opus, that can be $5-20+ per prompt — and there's no way to know in advance. Tarmac installs as a Claude Code hook. Every time you submit a prompt, Tarmac intercepts it, extracts features, runs a trained regression model with conformal prediction intervals, and injects a cost estimate into Claude's context. Claude then presents the estimate and asks whether to proceed. ⚡ TARMAC COST ESTIMATE ━━━━━━━━━━━━━━━━━━━━━━━━━━━ Sonnet 4.6 $0.12 - $0.89 Opus 4.6 $0.58 - $4.34 Haiku 4.5 $0.03 - $0.22 Task type: code modification Input: 847 tokens Coverage: 80% confidence interval Method: conformal-regression ━━━━━━━━━━━━━━━━━━━━━━━━━━━ No API key required. No external calls. Everything runs locally in ~5ms. npm install -g tarmac-cost tarmac-cost setup That's it. Open Claude Code and every prompt (5+ words) will now include a cost estimate. After a session, run tarmac-cost report to compare the estimate to what actually happened: $ tarmac-cost report 📊 TARMAC SESSION REPORT ━━━━━━━━━━━━━━━━━━━━━━━━━━━ Model: Opus 4.6 Estimated: $0.58 - $4.34 Actual: $2.17 Result: ✅ Within estimate API calls: 12 Duration: 94s ━━━━━━━━━━━━━━━━━━━━━━━━━━━ The report compares the last estimate against the actual cost from that session's transcript. Run it after exiting a Claude Code session to see how the prediction held up. To uninstall, remove the Tarmac hook entries from ~/.claude/settings.json . Validated on 3,381 real tasks (3,000 SWE-bench + 381 local Claude Code sessions): | Dataset | Coverage (80% target) | Median Interval Width | vs Heuristic Baseline | |---|---|---|---| | Overall (n=3,381) | 81.1% | $0.78 | +19.3pp | | SWE-bench (n=3,000) | 83.6% | $0.85 | +14.3pp | | Opus 4.6 | 84.5% | $1.13 | +17.3pp | | Sonnet 4.6 | 81.7% | $0.67 | +13.0pp | | Haiku 4.5 | 84.6% | $0.47 | +7.8pp | "Coverage" = percentage of actual costs that fell within the predicted range. An 80% target means you should expect ~4 out of 5 estimates to contain the true cost. We hit 81.1% overall. ┌────────────┐ ┌──────────────┐ ┌────────────────┐ ┌──────────────┐ │ You type │────▶│ Claude Code │────▶│ Tarmac │────▶│ Claude │ │ a prompt │ │ hook fires │ │ estimates cost │ │ presents it │ └────────────┘ └──────────────┘ └────────────────┘ └──────────────┘ - Hook intercept — Claude Code's UserPromptSubmit hook pipes your prompt totarmac-cost estimate via stdin - Feature extraction — 24 features extracted from prompt text (length, code blocks, file paths, task keywords, vocabulary richness, etc.) - Per-model regression — Separate ridge regression models for Opus, Sonnet, and Haiku predict log₁₀(cost) - Conformal calibration — Residuals from a held-out calibration set determine the interval width needed for 80% coverage - Output — The estimate is injected as additionalContext into Claude's system prompt, which Claude then presents to the user Traditional approaches (heuristic multipliers, percentile-based ranges) can't provide coverage guarantees. Conformal prediction is a distribution-free method that gives calibrated prediction intervals: if you ask for 80% coverage, you get ~80% coverage, regardless of the underlying distribution. No assumptions about normality or homoscedasticity needed. | Category | Features | |---|---| | Size | log char count, word count, line count, sentence count | | Code signals | code blocks, file paths, function names, class names | | Error signals | stack traces, error messages | | Text properties | vocabulary richness, technical density, avg/max line length | | Task indicators | mentions fix, add, refactor, test, deprecation, regression, performance | | Structure | question count, URL count, inline code references | The model was trained on SWE-bench data (3,000 instances across Opus 4.6, Sonnet 4.6, and Haiku 4.5). # Install dependencies npm install # Train the model (outputs src/data/model-weights.ts) npx tsx train-model.ts # Run head-to-head validation against heuristic baseline npx tsx validate-conformal.ts # Feature importance analysis npx tsx signal-analysis.ts Training data files: data-swebench.json — SWE-bench leaderboard data with per-instance costsdata-swebench-statements.json — Problem statements for each SWE-bench instance tarmac/ ├── src/ │ ├── cli.ts # CLI entry point │ ├── types.ts # TypeScript interfaces │ ├── commands/ │ │ ├── estimate.ts # Cost estimation (UserPromptSubmit hook) │ │ ├── report.ts # Outcome recording (Stop hook) │ │ └── setup.ts # Hook installation + config │ ├── core/ │ │ ├── conformal-predictor.ts # Regression model + conformal intervals │ │ ├── prompt-classifier.ts # Task type classificatio
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유