뉴스피드 큐레이션 SNS 대시보드 저널

We need re-learn what AI agent development tools are in 2026

hackernews | | 🏷️ AI 딜
#2026 트렌드 #ai 딜 #ai 에이전트 #anthropic #chatgpt #claude #openai #openclaw #개발 도구 #2026 ai 트렌드 #ai 개발 도구

요약

2025년은 AI 에이전트 원년으로, 업계가 에이전트의 동작 방식에 합의하고 서브 에이전트를 생성해 컨텍스트 창 크기를 우회하는 기술이 주목받았습니다. 문서 검색(RAG), 웹 검색, 프롬프트 평가 등 에이전트 개발의 핵심 기능들은 대부분 상품화되어 기본 LLM 서비스에 기본 탑재되는 수준이 되었습니다. 특히 보안 취약점과 데이터 삭제 문제를 야기하는 OpenClaw의 등장과 MCP의 보안 정책 붕괴는 기업 환경에 큰 위협으로 작용하고 있습니다. 이에 따라 내년(2026)을 대비하기 위해서는 AI 에이전트 빌더의 평가 기준을 대폭 수정하고, 자동화 과정에서 멱살잡이식 프롬프트에 의존하기보다는 사전에 명확히 정의된 결정론적 로직의 중요성을 재평가해야 합니다.

왜 중요한가

개발자 관점

검토중입니다

연구자 관점

검토중입니다

비즈니스 관점

검토중입니다

본문

This article was written by Andrew Green, technical writer and industry analyst. We pay Andrew, but he refuses to write anything else but his own opinion. The big boys entered the market, OpenClaw appropriated the MCP security strategy, and everyone started vibe coding but only if they already knew how to code. It really feels like 2025 was the year of agents, mainly because the industry came to a consensus about how we expect an agent to behave. That and because we found we can bypass context window sizes by spawning sub-agents. When we first wrote the Enterprise AI agent development tools, we focused a lot on the building blocks of writing agents, such as RAG, memory, tools, and evaluations. One year later, all these capabilities appear to have been commoditized to some degree. We now expect most vendors to allow customers to use a document as context and grounding, or to integrate with Promptfoo (now acquired by OpenAI) for evaluations. Granted, there are some niche things, like reranking RAG documents based on semantic similarity, which are still differentiators. However, a lot of agent work today doesn’t even need RAG. Even things like web search, which you had to orchestrate explicitly, are now natively available with most vanilla LLM services like ChatGPT and Claude. MCP had a meteoric rise and then fizzled out. I appreciated Anthropic’s attempts at adding security features such as auth around MCP, but then OpenClaw threw all of that out the window. OpenClaw is not in the cards for any sensible organization considering its tendency to delete data and expose ALL the vulnerabilities. With this in mind, we need a rather drastic update on our framework for evaluating AI agent builders. So, I have a set of questions that I want to answer myself to understand how a 2026 version of the report will look. - What got commoditized or natively implemented in vanilla models or LLM services? - What stands from last year? - What is still relevant from last year but underappreciated? - What should change in our evaluation today? - What did the vendors do over the past year? - What about coding agents? What got commoditized or natively implemented in vanilla models or LLM services? Today, even basic LLM-as-a-service products come close to being agents. I mentioned web search above, but some of the others include: - Claude’s and ChatGPT’s Projects, which allow users to upload docs, code, and files to create themed collections that can be referenced multiple times. - Claude Connectors and ChatGPT apps, which connect to apps, files, and services. These connectors are built by third parties. - Native Skills.md, which are glorified prompt templates, but they still replace some additional work that would have been required in agent builders last year. - Honorable mentions to Claude Code and Codex which are not really part of the scope but need to be acknowledged This means all these capabilities are now table stakes, and we expect every agent builder to have them. What stands from last year? The codability axis, which evaluates the capabilities available in a product that allow organizations to automate processes using large language models. Some evaluations points that will appear again will include the likes of: - Routing and branching, which queries to the most appropriate specialized agent or process based on the content, intent, or requirements of the input. - Parallelization, run multiple AI agents or processes simultaneously when their tasks are independent of each other - Orchestrator-workers, in which a central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results. - Sequential Agents, where AI agents are designed to work in a specific order, where each agent performs its specialized task and passes the results to the next agent in the sequence. - Multi-Agents, which can interact in a conversation thread while maintaining awareness of each other's responses and the overall conversation state. What is still relevant from last year but underappreciated? The deterministic component. It looks like those who want to automate processes using agents (including in difficult-to-automate and proprietary fields like enterprise networks where I do a lot of work) prefer nudging an agent 20 times to get a response they want instead of putting some work upfront in defining some deterministic logic. I’ve also seen that the deterministic logic part is not that much focused on performing functions (e.g. normalizing data to a common schema) but rather in ensuring that agents go through a set of pre-defined processes when completing a task. For example, you want an AI agent in security operations to always check a URL or file hash in VirusTotal. You don’t want it to reason its way through checking them on the off chance that it might not. A good example below is of an AI agent running a security audit 50 times, mapping whether all vulnerabilities were detected. In the screenshot below, you se

관련 저널 읽기

전체 보기 →