AI 에이전트에 10만 개의 도구를 제공하면 상황이 더욱 악화됩니다.
hackernews
|
|
🔬 연구
#ai 에이전트
#chatgpt
#review
#도구 통합
#생산성
#슬랙
#자동화
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
최신 모델의 지능이 아닌 '도구 사용 능력'이 AI 에이전트 실전 적용의 병목이라는 점을 강조하며, 3,000개 이상의 통합을 지원하는 경험을 공유했습니다. 수천 개의 도구 설명으로 컨텍스트 창을 낭비하는 대신, 도구를 한 줄로 요약(스킬)하고 필요 시에만 상세 정보를 불러오는 방식으로 효율성을 극대화했습니다. 또한 JSON 스키마 대신 Python 코드를 작성하게 하여 도구 간 합성과 확장성을 문제없이 해결했으며, 마크다운 파일을 통해 에이전트의 지속적인 기억을 관리하는 방식을 채택했습니다.
본문
Most AI agents demo well and fall apart in production. We've spent the past year building an AI coworker that lives in Slack, connects to your company's tools, and automates real work. Here's what we learned about agent architecture along the way. Intelligence is not the bottleneck -- tool use is Every few months a new frontier model drops with a 20% benchmark improvement, and our agent gets smarter overnight without us writing a line of code. That's great, but intelligence was never the real bottleneck. The bottleneck is tool use. An AI that can reason brilliantly about your marketing spend is useless if it can't call the Meta Ads API. An AI that writes perfect status updates is useless if it can't post to Slack. The unlock isn't making the model smarter -- it's giving it hands. We support ~3,000 integrations, each bringing anywhere from 10 to 100+ individual tools. A single user who connects Notion, Linear, HubSpot, and Gmail might give the agent access to 200+ tools. This is already more than 99% of ChatGPT users ever do, even though ChatGPT also has integrations. The difference between "theoretically supports tools" and "actually connected to your tools" is the difference between a toy and a product. But that raises an obvious question: how do you expose an agent to tens of thousands of potential tools without blowing up its context window? The context window is prime real estate The naive approach is to describe every available tool in the system prompt so the model knows what it can do. This is catastrophically wasteful. We went through three iterations: - Everything in context. Hundreds of tool schemas dumped into the system prompt. Slow, expensive, and the model got confused about which tool to use. - Search-based discovery. Tools live in files, agent searches when needed. Problem: the agent doesn't know what it doesn't know. If you ask about the weather, it won't think to grep for a "web search" function. - One-line summaries with lazy loading. Each capability gets a single-line description in the system prompt -- we call these "skills" (a pattern that's become common in agent frameworks, though we use it in some novel ways). We have ~18 core skills, plus one for every integration the user connects. When the agent decides it needs one, it reads the full skill file in one step: detailed instructions, code examples, known gotchas, and the right function signatures to call. A user with 50 integrations has ~68 skills, but that's still just 68 lines of context. Maximum discoverability, minimum cost. The important nuance: when you connect a new integration, the agent explores it first. It tests the available API endpoints, discovers your team's IDs and project names, figures out what works and what doesn't, and writes all of this into a new skill file. The next time any agent invocation needs that integration, it doesn't search the codebase or guess at function signatures -- it reads the skill and immediately knows how to write the right code. This is strictly better than search-based discovery because the agent doesn't need to formulate a query for something it doesn't know exists yet. The general principle: treat your context window like RAM in a memory-constrained system. Page things in only when needed. Keep the hot path small. Code is the best tool-calling interface Standard tool calling (JSON schemas, function calling APIs) works fine for 10-20 tools. It completely breaks down at scale. You can't put 500 tool schemas in context, and even if you could, the model would struggle to pick the right one. Our solution: the agent writes code. Instead of calling a send_email tool through a structured API, it writes a Python script that imports a send_email function and calls it. This sounds like a hack, but it's actually strictly superior: - Composition. The agent can call three tools in a for loop, filter results with conditionals, and handle errors -- all in one turn. With structured tool calling, each of these would be a separate round trip. - Discoverability. The agent can browse a directory of available functions the same way a human developer would. It reads the module, sees the function signatures, and figures out how to use them. - Scalability. Adding a new tool means adding a Python function with a docstring. No schema changes, no prompt engineering. LLMs are trained on enormous amounts of code. They're already good at this. Leaning into that strength -- treating the agent as a developer rather than a tool-caller -- was one of our best decisions. Memory through plain text files LLMs are stateless. There are many approaches to giving agents memory -- vector databases, RAG pipelines, summary-based context injection, persistent scratchpads. We tried most of them and landed on something surprisingly simple: markdown files on a shared filesystem. When our agent explores a new integration -- say, your Linear account -- it writes down what it learned into that integration's skill file. The file structure
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유