AI 토큰 사용량을 96% 줄이시겠습니까?

hackernews | 2026년 5월 1일 01:01 | 📰 뉴스

#review

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

AWS 개발자 어드보케이트인 모건 윌리스는 출시 1년 만에 1,400만 다운로드를 기록한 에이전트형 오픈소스 프레임워크인 ‘스트랜드 에이전트’를 소개하며 개발 데모를 진행했습니다. 데모에서 API 엔드포인트를 에이전트 툴에 직접 매핑할 때는 5회의 호출과 52,000 토큰이 소모되었으나, 인텐트 기반 툴로 변경하자 단 1회의 호출로 토큰 사용량을 2,000개로 96%나 줄이는 효과를 입증했습니다. 이는 결과 중심의 툴 설계가 AI의 효율성과 이해도를 크게 높일 수 있음을 보여줍니다.

본문

Cut AI token usage by 96%? Here’s how AWS Strands Agents does it. For this episode of The New Stack Makers, I sat down with AWS developer advocate Morgan Willis to talk about Strands Agents, the company’s open source agentic framework, which has seen over 14 million downloads since it launched just under a year ago. Willis brought a hands-on demo built around a simple accounting API to show what building with Strands looks like in practice. The demo walks through three iterations of the same task: looking up the latest invoice for a customer. First, Willis mapped each API endpoint directly to an agent tool, the way most developers would by default. The agent needed five chained API calls and burned roughly 52,000 tokens. Then she swapped in intent-based tools that are built around an outcome rather than a data operation. With the same query, getting an answer now took one tool call and only 2,000 tokens. “It’s calling multiple API’s, but rolling them up into one intent-based tool for the agent that it’s going to have a better time using — and understanding when exactly to use it. […] “The fewer tools that you expose to your agent, the less likely it is to call the wrong one.” “Your agent is going to have a better time reasoning around what tool to use and when, because these tools are more aligned to a task and less aligned to data,” Willis tells The New Stack. “The fewer tools that you expose to your agent, the less likely it is to call the wrong one.” Tools + semantic search The third iteration moved those tools to a remote MCP server via AWS Agent Core Gateway and enabled semantic search across the tool catalog, so the agent received only the tools relevant to each query, rather than the full set of 16. That cut token usage roughly in half again compared to loading everything. Willis says the broader principle at work here is that narrowly scoped agents tend to outperform general-purpose ones. “I think agents that are more narrowly defined tend to perform better than general use case agents. If you’re looking for context efficiency, speed, and accuracy, I would also look at your agent design as well.” Having many agents, each doing a small number of things, lets you design tools precisely for each use case rather than building a more general agent that tries to do everything. As MCP servers proliferate and tool catalogs grow, the question of which tools an agent actually sees on a given run is going to matter as much as the tools themselves.

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기