하이브리드 검색으로 Agentic RAG를 구축하는 방법

Towards Data Science | 2026년 3월 14일 07:22 | 🔬 연구

#agentic #llm #rag #review #벡터 유사도 #하이브리드 검색

원문 출처: Towards Data Science · Genesis Park에서 요약 및 분석

요약

해당 기사는 하이브리드 검색 기능을 결합하여 성능을 강화한 '에이전틱 RAG(Agentic RAG)' 시스템을 구축하는 방법을 단계적으로 안내합니다. 단순 검색을 넘어 AI 에이전트가 능동적으로 정보를 수집하고 추론할 수 있도록 설계하는 구체적인 기술적 접근 방식과 시사점을 다룹니다.

본문

Traditionally, RAG first uses vector similarity to find relevant chunks of documents in the corpus and then feeds the most relevant chunks into the LLM to provide a response. This works really well in a lot of scenarios since semantic similarity is a powerful way to find the most relevant chunks. However, semantic similarity struggles in some scenarios, for example, when a user inputs specific keywords or IDs that need to be explicitly located to be used as a relevant chunk. In these instances, vector similarity is not that effective, and you need a better approach to find the most relevant chunks. This is where keyword search comes in, where you find relevant chunks while using keyword search and vector similarity, also known as hybrid search, which is the topic I’ll be discussing today. Why use hybrid search Vector similarity is very powerful. It is able to effectively find relevant chunks from a corpus of documents, even if the input prompt has typos or uses synonyms such as the word lift instead of the word elevator. However, vector similarity falls short in other scenarios, specifically when searching for specific keywords or identification numbers. The reason for this is that vector similarity doesn’t weigh individual words or IDs specifically highly compared to other words. Thus, keywords or key identifiers are typically drowned in other relevant words, which makes it hard for semantic similarity to find the most relevant chunks. Keyword search, however, is incredibly good at keywords and specific identifiers, as the name suggests. With BM25, for example, if you have a word that only exists in one document and no other documents, and that word is in the user query, that document will be weighed very highly and most likely included in the search results. This is the main reason you want to use a hybrid search. You’re simply able to find more relevant documents if the user is inputting keywords into their query. How to implement hybrid search There are numerous ways to implement hybrid search. If you want to implement it yourself, you can do the following. - Implement vector retrieval via semantic similarity as you would have normally done. I won’t cover the exact details in this article because it’s out of scope, and the main point of this article is to cover the keyword search part of hybrid search. - Implement BM25 or another keyword search algorithm that you prefer. BM25 is a standard as it builds upon TF-IDF and has a better formula, making it the better choice. However, the exact keyword search algorithm you use doesn’t really matter, though I recommend using BM25 as the standard. - Apply a weighting between the similarity found via semantic similarity and keyword search similarity. You can decide this weighting yourself depending on what you regard as most important. If you have an agent performing a hybrid search, you can also have the agent decide this weighting, as agents will typically have a good intuition for when to use or when to wait, left or similarity more, and when to weigh keyword search similarity more There are also packages you can use to achieve this, such as TurboPuffer vector storage, which has a Keyboard Search package implemented. To learn how the system really works, however, it’s also recommended that you implement this yourself to try out the system and see if it works. Overall, however, hybrid search isn’t really that difficult to implement and can give a lot of benefits. If you’re looking into a hybrid search, you typically know how vector search itself works and you simply need to add the keyword search element to it. Keyword search itself is not really that complicated either, which makes hybrid search a relatively simple thing to implement, which can yield a lot of benefits. Agentic hybrid search Implementing hybrid search is great, and it will probably improve how well your RAG system works right off the bat. However, I believe that if you really want to get the most out of a hybrid search RAG system, you need to make it agentic. By making it agentic, I mean the following. A typical RAG system first fetches relevant chunks, document chunks, feeds those chunks into an LLM, and has it answer a user question However, an agentic RAG system does it a bit differently. Instead of doing the trunk retrieval before using an LLM to answer, you make the trunk retrieval function a tool that the LLM can access. This, of course, makes the LLM agentic, so it has access to a tool and has several major advantages: - The agent can itself decide the prompt to use for the vector search. So instead of using only the exact user prompt, it can rewrite the prompt to get even better vector search results. Query rewriting is a well-known technique you can use to improve RAG performance. - The agent can iteratively fetch the information, so it can first do one vector search call, check if it has enough information to answer a question, and if not, it can fetch even more information. This makes

원문 보기 (Towards Data Science)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기