Agentic RAG와 Classic RAG 비교: 파이프라인에서 제어 루프까지

Towards Data Science | | 💼 비즈니스
#agentic rag #classic rag #rag #tip #비교 분석 #파이프라인 #제어 루프
원문 출처: Towards Data Science · Genesis Park에서 요약 및 분석

요약

기사는 검색 증강 생성(RAG) 시스템을 구축할 때 사용 사례의 복잡성과 비용, 신뢰성 요구 사항에 따라 단일 패스 파이프라인(Classic RAG)과 적응형 검색 루프(Agentic RAG) 중에서 선택하는 방법을 실질적으로 안내합니다. 특히 기존의 선형적인 파이프라인 방식과 에이전트가 자율적으로 판단하고 반복하는 제어 루프 방식을 비교 분석하여, 개발자가 상황에 맞는 최적의 접근법을 결정할 수 있도록 돕는 데 초점을 맞추고 있습니다.

본문

Introduction : Why this comparison matters RAG began with a straightforward goal: ground model outputs in external evidence rather than relying solely on model weights. Most teams implemented this as a pipeline: retrieve once, then generate an answer with citations. Over the last year, more teams have started moving from that “one-pass” pipeline towards agent-like loops that can retry retrieval and call tools when the first pass is weak. Gartner even forecasts that 33% of enterprise software applications will include agentic AI by 2028, a sign that “agentic” patterns are becoming mainstream rather than niche. Agentic RAG changes the system structure. Retrieval becomes a control loop: retrieve, reason, decide, then retrieve again or stop. This mirrors the core pattern of “reason and act” approaches, such as ReAct, in which the system alternates between reasoning and action to gather new evidence. However, agents do not enhance RAG without tradeoffs. Introducing loops and tool calls increases adaptability but reduces predictability. Correctness, latency, observability, and failure modes all change when debugging a process instead of a single retrieval step. Classic RAG: the pipeline mental model Classic RAG is straightforward to understand because it follows a linear process. A user query is received, the system retrieves a fixed set of passages, and the model generates an answer based on that single retrieval. If issues arise, debugging usually focuses on retrieval relevance or context assembly. At a high level, the pipeline looks like this: - Query: take the user question (and any system instructions) as input - Retrieve: fetch the top-k relevant chunks (usually via vector search, sometimes hybrid) - Assemble context: Select and arrange the best passages into a prompt context (often with reranking) - Generate: Produce an answer, ideally with citations back to the retrieved passages What classic RAG is good at Classic RAG is most effective when predictable cost and latency are priorities. For straightforward “doc lookup” questions such as “What does this configuration flag do?”, “Where is the API endpoint for X?”, or “What are the limits of plan Y?”, a single retrieval pass is typically sufficient. Answers are delivered quickly, and debugging is direct: if outputs are incorrect, first check retrieval relevance and chunking, then review prompt behavior. Example (classic RAG in practice): A user asks: “What does the MAX_UPLOAD_SIZE config flag do?” The retriever pulls the configuration reference page where the flag is defined. The model answers in one pass, “It sets the maximum upload size allowed per request”, and cites the exact section. There are no loops or tool calls, so cost and latency remain stable. Where classic RAG hits the wall Classic RAG is a “one-shot” approach. If retrieval fails, the model lacks a built-in recovery mechanism. That shows up in a few common ways: - Multi-hop questions: the answer needs evidence spread across multiple sources - Underspecified queries: the user’s wording is not the best retrieval query - Brittle chunking: relevant context is split across chunks or obscured by jargon - Ambiguity: the system may need to ask clarifying questions, reformulate, or explore further before providing an accurate answer. Why this matters: When classic RAG fails, it often does so quietly. The system still provides an answer, but it may be a confident synthesis based on weak evidence. Agentic RAG: from retrieval to a control loop Agentic RAG retains the retriever and generator components but changes the control structure. Instead of relying on a single retrieval pass, retrieval is wrapped in a loop, allowing the system to review its evidence, identify gaps, and attempt retrieval again if needed. This loop gives the system an “agentic” quality: it not only generates answers from evidence but also actively chooses how to gather stronger evidence until a stop condition is met. A helpful analogy is incident debugging: classic RAG is like running one log query and writing the conclusion from whatever comes back, while agentic RAG is a debug loop. You query, inspect the evidence, notice what’s missing, refine the query or check a second system, and repeat until you’re confident or you hit a time/cost budget and escalate. A minimal loop is: - Retrieve: pull candidate evidence (docs, search results, or tool outputs) - Reason: synthesize what you have and identify what’s missing or uncertain - Decide: stop and answer, refine the query, switch sources/tools, or escalate For a research reference, ReAct provides a useful mental model: reasoning steps and actions are interleaved, enabling the system to gather more substantial evidence before finalizing an answer. What agents add Planning (decomposition) The agent can decompose a question into smaller evidence-based objectives. Example: “Why is SSO setup failing for a subset of users?” - What error codes are we seeing? - Which IdP configuration is used - Is

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →