Show HN: Arise – 실패 시 런타임에 자체 도구를 생성하는 에이전트

hackernews | 2026년 3월 16일 04:41 | 📦 오픈소스

#ai 딜 #ai 에이전트 #anthropic #claude #gpt-4 #llm #openai #도구 생성 #미들웨어 #자율성

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

AI 에이전트는 작업 수행에 실패할 때 스스로 필요한 도구를 생성하여 문제를 해결하는 'Arise'라는 시스템이 공개되었습니다. 이 에이전트는 런타임 중에 자체적으로 코드를 작성하고 실행하는 능력을 갖추어, 기존에 사전 정의된 도구로는 해결할 수 없었던 복잡한 문제들을 유연하게 처리할 수 있습니다. 이러한 접근 방식은 정적 도구의 한계를 넘어 에이전트의 자율성과 적응력을 획기적으로 높이는 혁신적인 개발로 평가받고 있습니다.

본문

Your agent works great on the tasks you planned for. ARISE handles the ones you didn't. ARISE is a framework-agnostic middleware that gives LLM agents the ability to create their own tools at runtime. When your agent fails at a task, ARISE detects the capability gap, synthesizes a Python tool, validates it in a sandbox, and promotes it to the active library — no human intervention required. Documentation | Quick Start | PyPI pip install arise-ai from arise import ARISE from arise.rewards import task_success arise = ARISE( agent_fn=my_agent, # any (task, tools) -> str function reward_fn=task_success, model="gpt-4o-mini", # cheap model for tool synthesis ) result = arise.run("Fetch all users from the paginated API") # Agent fails → ARISE synthesizes fetch_all_paginated tool → agent succeeds Episode 1 | FAIL | reward=0.00 | skills=2 Task: "Fetch paginated users with auth" Episode 2 | FAIL | reward=0.00 | skills=2 Episode 3 | FAIL | reward=0.00 | skills=2 [Evolution triggered — 3 failures on API tasks] → Synthesizing 'parse_json_response'... 3/3 tests passed ✓ → Synthesizing 'fetch_all_paginated'... sandbox fail → refine → 1/1 passed ✓ Episode 4 | OK | reward=1.00 | skills=4 Agent now has the tools it needs - Self-evolving tool library — fail → detect gap → synthesize → sandbox test → promote - Framework-agnostic — any (task, tools) -> str function, Strands, LangGraph, CrewAI - Sandboxed validation — subprocess or Docker, adversarial testing, import restrictions - Distributed mode — S3 + SQS for stateless deployments (Lambda, ECS, AgentCore) - Skill registry — share evolved tools across projects - Version control + rollback — SQLite checkpoints, arise rollback - A/B testing — refined skills tested against originals before promotion - Web Console — create agents, watch evolution live, inspect evolved code ( arise console ) - Dashboard — terminal TUI and web UI for monitoring | Model | Condition | AcmeCorp (SRE) | DataCorp (Data Eng) | |---|---|---|---| | Claude Sonnet | ARISE | 78% | — | | Claude Sonnet | No tools | 63% | — | | GPT-4o-mini | ARISE | 57% | 92% | | GPT-4o-mini | No tools | 48% | 50% | ARISE improves task success by +9–42 percentage points across models and domains. See the full benchmark results. A web UI for creating agents, watching evolution live, and inspecting evolved tools: arise console # Opens http://localhost:8080 - Create agents — pick model, set system prompt, choose reward function - Live terminal feed — watch episodes and evolution in real-time via WebSocket - Skill inspector — syntax-highlighted code, test suite, performance metrics - Editable config — change reward function, system prompt, failure threshold on the fly - All Skills / Evolution Log — global views across all agents Full documentation at arise-ai.dev: - Installation — install and configure - Quick Start — complete evolution loop walkthrough - How It Works — the 5-step evolution pipeline - Reward Functions — built-in and custom reward functions - Safety & Validation — sandbox, adversarial testing, production recommendations - Distributed Mode — S3 + SQS for stateless deployments - Framework Adapters — Strands, LangGraph, CrewAI, raw OpenAI/Anthropic - CLI Reference — all CLI commands - API Reference — ARISE class, config, types | Example | Description | |---|---| quickstart_evolution.py | Full evolution loop: agent fails → ARISE evolves tool → agent succeeds | quickstart.py | Math agent evolves statistics tools | api_agent.py | HTTP agent evolves auth + pagination (mock server) | devops_agent.py | DevOps agent evolves log analysis tools | strands_agent.py | Strands integration with Bedrock | demo/agentcore/ | AgentCore deployment with A2A protocol | pip install arise-ai # core (just pydantic) pip install arise-ai[aws] # + boto3 for distributed mode pip install arise-ai[litellm] # + litellm for multi-provider LLM pip install arise-ai[docker] # + docker sandbox backend pip install arise-ai[dashboard] # + rich, fastapi for dashboard pip install arise-ai[otel] # + opentelemetry for tracing pip install arise-ai[all] # everything ARISE builds on ideas from LATM, VOYAGER, CREATOR, ADAS, and CRAFT. ARISE adds the production layer: framework-agnostic integration, sandboxed validation, adversarial testing, version control, distributed deployment, and A/B testing. MIT

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기