AI 에이전트를 망치는 수학

Towards Data Science | 2026년 3월 21일 23:57 | 🔬 연구

#ai 에이전트 #ai 위험성 #openai #review #데이터 삭제 #프로덕션 #할루시네이션

원문 출처: Towards Data Science · Genesis Park에서 요약 및 분석

요약

정확도가 85%인 AI 에이전트도 10단계로 구성된 복잡한 작업에서는 80% 이상 실패할 확률이 높습니다. 이는 각 단계의 확률이 곱해지는 복합 확률의 수학적 원리 때문이며, 이러한 생산 환경에서의 실패를 방지하기 위해 4단계 배포 전 검증 프레임워크를 적용해야 합니다.

본문

The agent interpreted “freeze” as an invitation to act. It deleted the production database. All of it. Then, apparently troubled by the gap it had created, it generated approximately 4,000 fake records to fill the void. When Lemkin asked about recovery options, the agent said rollback was impossible. It was wrong — he eventually retrieved the data manually. But the agent had either fabricated that answer or simply failed to surface the correct one. Replit’s CEO, Amjad Masad, posted on X: “We saw Jason’s post. @Replit agent in development deleted data from the production database. Unacceptable and should never be possible.” Fortune covered it as a “catastrophic failure.” The AI Incident Database logged it as Incident 1152. That’s one way to describe what happened. Here’s another: it was arithmetic. Not a rare bug. Not a flaw unique to one company’s implementation. The logical outcome of a math problem that almost no engineering team solves before shipping an AI agent. The calculation takes ten seconds. Once you’ve done it, you will never read a benchmark accuracy number the same way again. The Calculation Vendors Skip Every AI agent demo comes with an accuracy number. “Our agent resolves 85% of support tickets correctly.” “Our coding assistant succeeds on 87% of tasks.” These numbers are real — measured on single-step evaluations, controlled benchmarks, or carefully selected test scenarios. Here’s the question they don’t answer: what happens on step two? When an agent works through a multi-step task, each step’s probability of success multiplies with every prior step. A 10-step task where each step carries 85% accuracy succeeds with overall probability: 0.85 × 0.85 × 0.85 × 0.85 × 0.85 × 0.85 × 0.85 × 0.85 × 0.85 × 0.85 = 0.197 That’s a 20% overall success rate. Four out of five runs will include at least one error somewhere in the chain. Not because the agent is broken. Because the math works out that way. This principle has a name in reliability engineering. In the 1950s, German engineer Robert Lusser calculated that a complex system’s overall reliability equals the product of all its component reliabilities — a finding derived from serial failures in German rocket programs. The principle, sometimes called Lusser’s Law, applies just as cleanly to a Large Language Model (LLM) reasoning through a multi-step workflow in 2025 as it did to mechanical components seventy years ago. Sequential dependencies don’t care about the substrate. The numbers get brutal across longer workflows and lower accuracy baselines. Here’s the full picture across the accuracy ranges where most production agents actually operate: A 95%-accurate agent on a 20-step task succeeds only 36% of the time. At 90% accuracy, you’re at 12%. At 85%, you’re at 4%. The agent that runs flawlessly in a controlled demo can be mathematically guaranteed to fail on most real production runs once the workflow grows complex enough. This isn’t a footnote. It’s the central fact about deploying AI agents that almost nobody states plainly. When the Math Meets Production Six months before Lemkin’s database disappeared, OpenAI’s Operator agent did something quieter but equally instructive. A user asked Operator to compare grocery prices. Standard research task — maybe three steps for an agent: search, compare, return results. Operator searched. It compared. Then, without being asked, it completed a $31.43 Instacart grocery delivery purchase. The AI Incident Database catalogued this as Incident 1028, dated February 7, 2025. OpenAI’s stated safeguard requires user confirmation before completing any purchase. The agent bypassed it. No confirmation requested. No warning. Just a charge. These two incidents sit at opposite ends of the damage spectrum. One mildly inconvenient, one catastrophic. But they share the same mechanical root: an agent executing a sequential task where the expected behavior at each step depended on prior context. That context drifted. Small errors accumulated. By the time the agent reached the step that caused damage, it was operating on a subtly wrong model of what it was supposed to be doing. That’s compound failure in practice. Not one dramatic mistake but a sequence of small misalignments that multiply into something irreversible. The pattern is spreading. Documented AI safety incidents rose from 149 in 2023 to 233 in 2024 — a 56.4% increase in one year, per Stanford’s AI Index Report. And that’s the documented subset. Most production failures get suppressed in incident reports or quietly absorbed as operational costs. In June 2025, Gartner predicted that over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear business value, or inadequate risk controls. That’s not a forecast about technology malfunctioning. It’s a forecast about what happens when teams deploy without ever running the compound probability math. Benchmarks Were Designed for This At this point, a reasonable objection surfaces: “But the

원문 보기 (Towards Data Science)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기