Agentic AI에는 토큰만으로는 측정할 수 없는 컴퓨팅이 필요합니다.
hackernews
|
|
🔬 연구
#ai 전략
#review
#기업 도입
#에이전트 ai
#컴퓨팅
#토큰 경제학
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
AI 경제 전략가인 앨런 제이컵슨은 기업 내 에이전트 AI 도입이 증가함에 따라 과거 비용 산출의 지표였던 ‘토큰’ 수치와 실제 ‘컴퓨팅’ 작업량 사이에 심각한 괴리가 발생하고 있다고 지적했습니다. 에이전트 AI는 복잡한 계획, 검색, 병렬 작업 수행 등의 단계를 거치며 컴퓨팅 자원을 기하급수적으로 소모하지만, 이러한 작업량이 토큰 카운트에 정확히 반영되지 않아 기업들이 예상치 못한 비용 폭증을 겪고 있습니다. 전문가는 토큰이 단순한 출력 측정치인 반면 실제 비용은 실행 과정에서 발생하므로, 산업계는 토큰 중심의 측정 방식을 넘어 컴퓨팅 작업량을 기반으로 한 새로운 비용 관리 모델로 전환해야 한다고 강조했습니다.
본문
By Alan Jacobson, AI Economics Strategist This week, stories in The New York Times and The Wall Street Journal highlighted something that’s been quietly building inside companies: employees are deploying AI agents that generate massive volumes of tokens—and massive, unexpected costs. The assumption behind most AI systems is simple: tokens approximate usage, and usage approximates cost. That assumption is now breaking, because… Tokens measure text. Agents don’t generate a commensurate amount of text for the work they perform. So agents use compute that isn’t reflected in the tokens they generate. If tokens don’t track compute, pricing can’t track cost—and profit margins compress. | Workflow A: Simple prompt/response | Workflow B: Agentic workflow | |---|---| | 1. User prompt: “Summarize this document” Tokens generated: 25 Compute: single inference pass | 1. User prompt: “Summarize this document” Tokens generated: 25 Compute: initial inference | | 2. Model produces summary Tokens generated: 200 Compute: single inference pass | 2. Agent retrieves 5 documents for context Tokens generated: 0 Compute: embedding + vector search + ranking | | Total tokens: 225 Actual work: 1 pass | 3. Agent evaluates relevance, discards 3 docs, keeps 2 Tokens generated: 0 Compute: multiple inference passes | | 4. Agent attempts first summary draft, determines it is insufficient Tokens generated: 0 Compute: inference pass | | | 5. Agent retries with a different prompt strategy Tokens generated: 0 Compute: inference pass | | | 6. Agent calls external metadata tool Tokens generated: 0 Compute: API call + processing | | | 7. Agent produces final summary Tokens generated: 200 Compute: final inference pass | | | Total tokens: 225 Actual work: 5-7 inference passes + retrieval + tool use | | | What billing sees vs what actually happened | | | Tokens billed: 225 Inference passes: 1 Retrieval ops: 0 Tool/API calls: 0 Actual compute: Low | Tokens billed: 225 Inference passes: 5-7 Retrieval ops: multiple Tool/API calls: 1+ Actual compute: Significantly higher | If you are counting tokens, you are flying blind for two reasons: - Semantic blindness: Tokens count words, but they don’t understand meaning. A simple request and a complex task may use the same number of tokens — while requiring very different amounts of compute. - Invisibility: Agentic workflows perform work off-screen — retrieval, tool use, iteration — that never becomes tokens at all. So tokens measure neither the meaning of the work, nor the full amount of work performed. You can count tokens, but you can’t count on them to measure, provision, bill or control compute. To put it in historical context again, using tokens to measure compute is like measuring electricity in horsepower: It didn’t work after machines replaced horses. Tokens are the horsepower of AI. Compute is the kilowatt-hour. In early AI systems—simple prompt and response—that approximation mostly held. One request triggered one model pass. Tokens were an inaccurate, albeit workable proxy for compute. Agentic AI changes that. A single request now triggers multiple steps: - planning - retrieval - tool use - validation - retries - sub-agents running in parallel Each step requires compute. Each step reprocesses context. Each step adds work. But not all of that work shows up proportionally in token counts. The result: Compute grows with execution depth. Tokens do not. This is the disconnect now surfacing in real-world deployments. Tokens have never been an accurate proxy for compute. They were merely easy to count. But in a world of compound execution, the gap becomes impossible to ignore. A system can: - reprocess the same context multiple times - execute chains of model calls - spawn parallel tasks - run validation and retry loops All of which consume compute—without a clean, proportional increase in tokens. So while organizations can: - count tokens - monitor usage - even reduce spend They still cannot measure or control the underlying compute driving cost And if cost cannot be measured correctly, it cannot be priced or controlled. That’s why this is surfacing now. Not because AI suddenly became expensive—but because agentic AI multiplies compute in ways tokens were never designed to capture. The industry is still measuring output. The cost is coming from execution. Tokens just show the tip of the iceberg. – Published on Sunday, March 22, 2026
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유