HN 표시: Autoresearch, 선별된 AutoResearch 사용 사례

hackernews | 2026년 3월 24일 04:42 | 📦 오픈소스

#ai 모델 #ai 에이전트 #autoresearch #claude #사용 사례 #오픈소스 #최적화

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

최근 공개된 'AutoResearch'는 선별된 유스 케이스를 통해 연구 워크플로우의 효율성을 혁신적으로 개선할 수 있는 도구로 주목받고 있습니다. 이 도구는 다양한 시나리오에 맞춰 최적화된 기능을 제공함으로써, 사용자가 복잡한 정보 수집 및 분석 과정을 자동화하여 시간을 절약할 수 있도록 돕습니다. 관련 커뮤니티에서는 AutoResearch를 활용한 구체적인 활용법과 사례들이 공유되며 사용자들의 관심이 집중되고 있습니다.

본문

A curated list of AutoResearch use cases with optimization traces and open source implementations. Every entry includes a link to the actual optimization trajectory so you can see what the agent tried, not just the final result. AutoResearch is, at its core, a prompt. Karpathy released it as a single markdown file - program.md , that instructs a coding agent (Claude Code, Codex, or similar) to follow an optimization workflow. The agent edits one file (train.py , that trains a language model), runs for a fixed 5 minutes on a GPU, checks whether the metric improved, and either commits the change or reverts it. Then it loops forever. The specific program.md that ships with AutoResearch is written for one task: training a GPT model. But the structure - iteratively optimizing a file against an evaluation metric, with a discard/keep loop - turns out to be portable. In the weeks since release, the community has adapted it to GPU kernel optimization, template engine optimization, tabular ML engineering, and more. The program.md for each of these looks different, but the loop is the same. | Use Case | Description | Author | Links | Traces | |---|---|---|---|---| | LLM training optimization | The original - optimize nanoGPT training code. 20 improvements found overnight on hand-tuned code | Andrej Karpathy | GitHub · Tweet | progress chart | | Speed up Shopify's template engine | 53% faster parse+render, 61% fewer allocations from 93 automated commits on Shopify's Liquid engine | Tobi Lutke (Shopify CEO) | GitHub · Tweet | PR | | GPU kernel optimization | Autoresearch applied to CUDA kernel optimization (18 → 187 TFLOPS) | RightNow AI | GitHub · Tweet | progress chart | | Voice agent prompt engineering | Optimize voice agent prompts with automated evaluation (score 0.728 → 0.969) | Archie Sengupta | GitHub · Tweet | progress chart | | Predict baseball pitch speed | Build predictive model for pitch velocity from biomechanics data (R² 0.44 → 0.78) | Kyle Boddy (Driveline Baseball) | Tweet | progress chart | | XGBoost for tennis match prediction | Predict ATP/WTA match outcomes - encountered and documented reward hacking | Nick Oak | Blog · GitHub | blog | | RL post-training optimization | Autoresearch for RL hyperparameters on Qwen 0.5B + GSM8K (eval 0.475 → 0.550 in fewer steps) | Vivek Kashyap | GitHub · Tweet | progress chart | | Ancient scroll ink detection | Vesuvius Challenge autoresearch agent swarm for ink detection models. 4 agents 24/7, cross-scroll generalization nearly doubled | Vesuvius Challenge | Blog | blog | | Earth system model optimization | Hybrid: LLM proposes formula structures, TPE optimizes parameters. Fire correlation 0.09→0.65 | Dev Paragiri (UMD CS) | Tweet · Blog | blog | | Bitcoin price formula discovery | Autonomous search for best time-based formula predicting Bitcoin price. 328 experiments, 50.5% RMSE improvement over power law. Walk-forward OOS evaluation with bootstrap significance testing | Carlos Baquero | GitHub | progress chart | | Project | Description | Links | |---|---|---| | autoresearch | The original - single GPU, 630 lines of Python | GitHub | | pi-autoresearch | Generalized as a Pi extension. Works for any optimization target - test speed, bundle size, build times, Lighthouse scores | GitHub | | autoresearch-mlx | Apple Silicon (MLX) port. No PyTorch required, uses unified memory | GitHub | | autoresearch-win-rtx | Windows + consumer RTX GPU port (RTX 2060 through 4090) | GitHub | | autoresearch-at-home | Distributed autoresearch - SETI@home style. Multi-agent swarm coordination | GitHub | | autoresearch (Claude Skill) | Generalized as a Claude Code skill for any domain | GitHub | | agent-digivolve-harness | A control layer for long-running CLI agent work. Generalizes the autoresearch keep/revert loop with persistent run state, explicit eval packages, baseline and holdout cases, and one bounded mutation per iteration | GitHub | | auto-agent | Autoresearch, but for AI agents. Given a golden dataset, it autonomously improves a target agent through an iterative hypothesis-driven loop: analyze failures, spawn a coding agent to implement fixes, evaluate, and accept or rollback | GitHub | Want to add a use case? Open a PR or file an issue. To make our work easier, please make submissions as verifiable as possible: - Minimum: a progress chart showing each experiment's score and breakthrough annotations (e.g. Karpathy's progress chart) - Ideal: a public repo with per-solution code and scores (the full exploration trace), or a Weco Observe dashboard link

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기