에이전트 기반 RAG의 오류 유형: 검색 과부하, 도구 폭주, 문맥 과부하 (그리고 이를 조기에 파악하는 방법)

Towards Data Science | 2026년 3월 21일 23:57 | 💼 비즈니스

#agentic rag #rag 오류 #tip #검색 과부하 #도구 폭주 #문맥 과부하

요약

이 기사는 에이전틱 RAG(Retrieval-Augmented Generation) 시스템이 실제 운영 환경에서 조용히 실패하며 막대한 클라우드 비용을 초래하는 주요 원인을 분석합니다. 검색 쓰레기(Retrieval Thrash), 툴 스톰(Tool Storms), 컨텍스트 블로트(Context Bloat)와 같은 대표적인 실패 모드를 소개하고, 비용 청구서가 도달하기 전에 이러한 문제를 조기에 감지 및 해결할 수 있는 구체적인 방안을 제시합니다.

왜 중요한가

개발자 관점

검토중입니다

연구자 관점

검토중입니다

비즈니스 관점

검토중입니다

본문

Agentic RAG fails differently because the system shape is different. It is not a pipeline. It is a control loop: plan → retrieve → evaluate → decide → retrieve again. That loop is what makes it powerful for complex queries, and it is exactly what makes it dangerous in production. Every iteration is a new opportunity for the agent to make a bad decision, and bad decisions compound. Three failure modes show up repeatedly once teams move agentic RAG past prototyping: - Retrieval Thrash: The agent keeps searching without converging on an answer - Tool storms: excessive tool calls that cascade and retry until budgets are gone - Context bloat: the context window fills with low-signal content until the model stops following its own instructions These failures almost always present as ‘the model got worse, but the root cause is not the base model. It lacks budgets, weak stopping rules, and zero observability of the agent’s decision loop. This article breaks down each failure mode, why it happens, how to catch it early with specific signals, and when to skip agentic RAG entirely. What Agentic RAG Is (and What Makes It Fragile) Classic RAG retrieves once and answers. If retrieval fails, the model has no recovery mechanism. It generates the best output it can from whatever came back. Agentic RAG adds a control layer on top. The system can evaluate its own evidence, identify gaps, and try again. The agent loop runs roughly like this: parse the user question, build a retrieval plan, execute retrieval or tool calls, synthesise the results, verify whether they answer the question, then either stop and answer or loop back for another pass. This is the same retrieve → reason → decide pattern described in ReAct-style architectures, and it works well when queries require multi-hop reasoning or evidence scattered across sources. But the loop introduces a core fragility. The agent optimises locally. At each step, it asks, “Do I have enough?” and when the answer is uncertain, it defaults to “get more”. Without hard stopping rules, the default spirals. The agent retrieves, more, escalates, retrieves again, each pass burning tokens without guaranteeing progress. LangGraph’s own official agentic RAG tutorial had exactly this bug: an infinite retrieval loop that required a rewrite_count cap to fix. If the reference implementation can loop forever, production systems certainly will. The fix is not a better prompt. It is budgeting, gating, and better signals. Failure Mode Taxonomy: What Breaks and Why Retrieval Thrash: The Loop That Never Converges Retrieval thrash is the agent repeatedly retrieving without settling on an answer. In traces, you see it clearly: near-duplicate queries, oscillating search terms (broadening, then narrowing, then broadening again), and answer quality that stays flat across iterations. A concrete scenario. A user asks: “What is our reimbursement policy for remote employees in California?” The agent retrieves the general reimbursement policy. Its verifier flags the answer as incomplete because it does not mention California-specific rules. The agent reformulates: “California remote work reimbursement.” It retrieves a tangentially related HR document. Still not confident. It reformulates again: “California labour code expense reimbursement.” Three more iterations later, it has burned through its retrieval budget, and the answer is barely better than after round one. The root causes are consistent: weak stopping criteria (the verifier rejects without saying what is specifically missing), poor query reformulation (rewording rather than targeting a gap), low-signal retrieval results (the corpus genuinely does not contain the answer, but the agent cannot recognise that), or a feedback loop where the verifier and retriever oscillate without converging. Production guidance from multiple teams converges on the same number: three cap retrieval cycles. After three failed passes, return a best-effort answer with a confidence disclaimer.’ Tool Storms and Context Bloat: When the Agent Floods Itself Tool storms and context bloat tend to occur together, and each makes the other worse. A tool storm occurs when the agent fires excessive tool calls: cascading retries after timeouts, parallel calls returning redundant data, or a “call everything to be safe” strategy when the agent is uncertain. One startup documented agents making 200 LLM calls in 10 minutes, burning $50–$200 before anyone noticed. Another saw costs spike 1,700% during a provider outage as retry logic spiralled out of control. Context bloat is the downstream result. Massive tool outputs are pasted directly into the context window: raw JSON, repeated intermediate summaries, growing memory until the model’s attention is spread too thin to follow instructions. Research consistently shows that models pay less attention to information buried in the middle of long contexts. Stanford and Meta’s “Lost in the Middle” study found performance drops of 20+ percentage poin

원문 보기 (Towards Data Science)

요약

왜 중요한가

본문

관련 저널 읽기