HN 표시: ABES – AI 에이전트의 신념 수정을 위한 메모리 아키텍처
hackernews
|
|
🔬 연구
#abes
#ai
#ai 에이전트
#llama
#review
#메모리 아키텍처
#신념 수정
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
연구자는 장기 실행 가능한 AI 에이전트를 위해 기존 텍스트 검색이 아닌 시간에 따른 믿음 상태 변화를 관리하는 메모리 아키텍처 **ABES(Adaptive Belief Ecology System)**를 개발했습니다. 이 시스템은 신뢰도, 모순, 감쇠 등 구조화된 속성을 통해 정보의 갱신과 수정을 지원하며, 822개의 테스트 통과와 에피소딕 메모리 96.8% 등의 평가 점수를 달성했습니다. 개발자는 현재 성능을 외부 검증 단계에서 보다 강력한 벤치마킹과 모순 처리, 장기 테스트 등을 통해 시스템을 고도화할 계획입니다.
본문
ABES is a living memory ecology where beliefs reinforce, contradict, mutate, and decay. It runs as a headless engine for autonomous AI agents. - Quick Start - Key Features - Architecture - The Demo and CLI Reference - Testing and Verification - Belief Model - Limitations - Roadmap - License git clone https://github.com/Aftermath-Technologies-Ltd/adaptive-belief-ecology-system.git cd adaptive-belief-ecology-system python -m venv .venv source .venv/bin/activate pip install -e ".[dev]" # Terminal 1 PYTHONPATH=$PWD uvicorn backend.api.app:app --host 0.0.0.0 --port 8000 # Terminal 2 curl -X POST http://localhost:8000/beliefs \ -H "Content-Type: application/json" \ -d '{"content": "System target is alpha-node-4", "confidence": 0.9, "source": "agent"}' curl http://localhost:8000/beliefs | python3 -m json.tool Requirements: Python 3.10+, Node.js 18+ (visual debugger), Ollama (optional). # Optional visual debugger cd frontend npm install npm run dev # Docker options docker compose up docker compose up --profile ui docker compose up --profile llm --profile ui For persistence across restarts: STORAGE_BACKEND=sqlite docker compose up Core flow: - Agents submit payloads through REST or WebSocket endpoints. - The scheduler runs 15 phases over belief state. - Beliefs are updated in memory or SQLite storage. - The relevance stack is selected for response generation. - Safety and response validation run before output. Ingestion pipeline stages: - Perception - Creation - Reinforcement - Decay - Contradiction audit - Mutation - Relevance ranking - LLM generation - Response validation API groups: /auth /chat (ingestion endpoints)/beliefs /bel /agents /clusters /snapshots abes demo runs a 12-turn scripted ingestion sequence that triggers belief creation, reinforcement, contradiction, recall, and update behavior. Script file: examples/demo_conversation.json All commands are available after pip install -e ".[dev]" : | Command | Description | |---|---| abes demo | Run scripted ingestion demo | abes chat | Launch backend with visual debugger | abes seed | Load seed beliefs from JSON | abes inspect | Show current ecology state | abes verify-quick | Run cognitive smoke test | abes verify-determinism | Compare repeated runs for reproducibility | Examples: abes demo --headless abes demo --headless --with-decay --decay-hours 12 abes inspect --json-out | jq . abes verify-quick --prompts 200 abes verify-determinism --runs 5 PYTHONPATH=$PWD pytest tests/ -q Current status: 822 passed, 0 failed. PYTHONPATH=$PWD python experiments/run_all.py Artifacts are in results/, including determinism, offline operation, conflict resolution, drift comparison, and decay sweep outputs. Detailed breakdowns are in docs/EVALUATIONS.md. | Metric | Result | |---|---| | Overall score | 825/1000 (82.5%) | | Episodic memory | 96.8% | | Working memory | 94.4% | | Semantic memory | 92.8% | Moral reasoning shortfalls stem from LLM refusals, not ecology mechanics. 15 blocks testing persistent memory, reinforcement, contradiction detection, noise rejection, multi-fact extraction, decay, context-aware ranking, safety, identity disambiguation, session isolation, mutation, deduplication, evidence ledger, passthrough, and full lifecycle. Protocol and expected results: docs/side_by_side_eval.md | Metric | ABES | Ollama (baseline) | |---|---|---| | Blocks passed | 14/15 | 6/15 | | Contradiction detection | Structural tension + NLI | Context-window only | | Session isolation | Zero cross-session leakage | No memory at all | | Safety (prompt injection) | 0 leaks across 5 attack vectors | 3 leaks | Full machine-readable results: results/side_by_side_eval.json The lifecycle diagram shows how beliefs transition through active, decaying, dormant, mutated, and deprecated states. Core fields: id ,content ,confidence ,tension ,salience status :active ,decaying ,dormant ,mutated ,deprecated is_axiom : immutable beliefs immune to decay, deprecation, and mutationmemory_tier :L1 (working, 50 cap),L2 (episodic, 2000),L3 (deep, 50k+)half_life_days ,evidence_for ,evidence_against ,evidence_balance links ,parent_id ,user_id ,session_id ,origin Formulas used in ranking and state updates: - Salience decay: s(t) = s0 * 0.5^(elapsed_hours / (half_life_days * 24)) - Confidence update: posterior = 0.7 * evidence_weight + 0.3 * prior_confidence - Stack ranking: weighted score over confidence, relevance, salience, recency, and tension Contradiction benchmark details are in backend/core/bel/semantic_contradiction.py and data/contradiction_corpus.json. Contradiction detection now uses a two-stage pipeline: rule-based proposition analysis followed by NLI fallback via cross-encoder/nli-deberta-v3-base . The NLI model catches implicit contradictions (e.g. "I'm vegetarian" vs "my favorite food is steak") that have low embedding similarity but high semantic opposition. For sets above 100 beliefs, the auditor switches from O(n^2) pairwise to neighborhood-based auditing (O(n*K), K=20), keeping latency bound
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유