HN 표시: AlphaEvolve에서 영감을 받은 포켓몬용 진화 하네스

hackernews | 2026년 3월 11일 04:38 | 🔬 연구

#alphaevolve #flink #kafka #review #진화 하네스 #포켓몬

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

이 글은 DeepMind의 논문에서 영감을 받은 'AlphaEvolve' 방식을 포켓몬 게임 에이전트에 적용해 본 저자의 실험을 다룹니다. LLM이 코드를 변형하고 적합도 함수로 평가하여 최적의 알고리즘을 스스로 발견하는 과정을 통해, 인간이 시도하지 않을 비직관적인 해결책을 찾아낼 수 있음을 보여줍니다. 저자는 포켓몬 게임에서 문 쿨타임 최적화를 자동으로 발견한 사례를 통해, 이러한 LLM 기반 진화 루프가 수동 반복보다 더 빠르게 탐색 공간을 최적화할 수 있음을 강조했습니다.

본문

See also: pokemon-kafka — Streams gameplay events through Kafka for large-scale data processing and uses Flink for real-time anomaly detection and self-healing. Autonomous Pokemon Red player that reads game memory, makes strategic decisions, and plays headlessly inside a stereOS VM. stereOS VM (/workspace) ┌──────────────────────────────────────────────────┐ │ │ │ PyBoy (headless, window="null") │ │ ↓ memory addresses │ │ MemoryReader → BattleState / OverworldState │ │ ↓ │ │ Strategy Engine (heuristic or LLM) │ │ ↓ button inputs │ │ GameController → PyBoy │ │ │ │ Tapes ← proxies LLM API calls, records sessions │ │ │ └──────────────────────────────────────────────────┘ ↕ shared mount (./ ↔ /workspace) Host: frames/ .tapes/ pokedex/ The agent runs a tight loop: read game state from known memory addresses, pick an action, send button inputs, tick the emulator forward. No display server needed. Screenshots come from PyBoy's internal frame buffer (screen.ndarray ), not from the OS. Shared mount permissions. The [[shared]] mount in jcard.toml maps ./ on the host to /workspace in the VM. Files keep their host ownership (UID 501 on macOS), but the VM runs as admin (UID 1000). This means host-created directories are read-only inside the VM by default. The install script opens write permissions on output directories (frames/ , pokedex/ , .tapes/ ) so the agent can write session data that persists back to the host. mb up # boot the VM, install deps, start the agent through Tapes mb attach # watch it play The VM configuration lives in jcard.toml . It mounts the repo at /workspace , installs Python + PyBoy + Tapes, and runs the agent. bash scripts/install.sh uv run scripts/agent.py rom/pokemon_red.gb --strategy heuristic --max-turns 1000 Add --save-screenshots to capture frames every 10 turns into frames/ . You must supply your own legally obtained ROM file in rom/ . Game loop. Each turn the agent ticks PyBoy forward, reads memory, decides, and acts. Turns are cheap — headless mode removes the 60fps cap and all rendering, so the emulator runs ~100x faster than real-time. The agent runs hundreds of thousands of them to progress through the game. Memory reading. MemoryReader pulls structured data from fixed addresses in Pokemon Red's RAM: battle type, HP, moves, PP, map ID, coordinates, badges, party state. These addresses are specific to the US release. Battle strategy. When a battle is detected (0xD057 != 0 ), the agent evaluates available moves using a type effectiveness chart, picks the highest-damage option, and manages healing and switching. The heuristic strategy requires no API calls. Overworld navigation. Outside battle, the agent follows waypoints defined in references/routes.json . It handles early-game scripted sequences (Red's room to Oak's lab) and general map-to-map routing. A stuck counter triggers random movement to break out of loops. Tapes proxies all LLM API calls made by the agent and records them with content-addressable session storage. The install script sets up Tapes automatically inside the VM. After a run, inspect what happened: tapes deck # terminal UI for session exploration tapes search "battle" # search session turns tapes checkout # restore a previous conversation state Session data lives in .tapes/ (gitignored). Inspired by Mastra's observational memory, this system reads the Tapes SQLite database, extracts noteworthy events via heuristic pattern matching (no LLM calls), and writes prioritized observations to memory files. Tapes records every LLM conversation as a content-addressable DAG of nodes in .tapes/tapes.sqlite . The observer walks these conversation chains, identifies patterns (errors, file creations, token usage), and writes observations alongside the database. .tapes/ ├── tapes.sqlite # Tapes DB: nodes, embeddings, facets └── memory/ ├── observations.md # date-grouped observations with priority tags └── observer_state.json # watermark tracking processed sessions What it extracts: - Session goals (first user message) - Tool errors and exception tracebacks - Files created during the session - Token usage summaries Each observation is tagged [important] , [possible] , or [informational] based on keyword matching (e.g. bug/error/crash are important, test/refactor are possible). # Preview observations without writing uv run scripts/observe_cli.py --dry-run # Process all unprocessed sessions uv run scripts/observe_cli.py # Reprocess everything from scratch uv run scripts/observe_cli.py --reset # Process a single session by root node hash uv run scripts/observe_cli.py --session Auto-detects .tapes/tapes.sqlite from cwd. Override with --db . Inspired by AlphaEvolve (DeepMind), the agent can automatically improve its navigation parameters through headless evaluation runs. Instead of manually tuning thresholds, the evolution harness runs 10 agent variants in parallel, scores them against a composite fitness function, and keeps the winner. How it works. The agent's navigator has tunable kn

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기