Operational Experience as a Performance Multiplier in AI Assistants
hackernews
|
|
📦 오픈소스
#ai 모델
#anthropic
#claude
#gemini
#gpt-5
#openai
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
2026년 PaTech Labs의 연구는 축적된 운영 경험이 AI 어시스턴트의 도메인 특화 과제 수행 능력을 유의미하게 향상시킨다는 사실을 입증했습니다. 8가지 조건과 50개의 질문을 대상으로 3개의 독립된 LLM 판사가 수행한 1,200건의 맹目 평가 결과, 경험이 강화된 ARIA 모델이 모든 기준선 모델을 앞섰습니다. Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro 등 최신 모델들이 기본 API 파라미터로 테스트에 참여했으며, 데이터와 분석 코드는 공개되었습니다.
본문
Data and code for the paper: "Operational Experience as a Performance Multiplier in AI Assistants: A Controlled Study with Triple-Judge Blind Evaluation" by Ravshan Nuraliev and Anastasia Rychkova (PaTech Labs, 2026). This repository contains the complete experimental data and analysis code for Study 15c, which demonstrates that accumulated operational experience measurably improves AI assistant performance on domain-specific tasks. Key findings: - 8 conditions, 50 questions, 1,200 blind judgments from 3 independent LLM judges - Experience-augmented condition (ARIA) significantly outperforms all baselines (Friedman p conditions) code/ generate_figures_v3.py # Generate Figures 6-7 (judge agreement scatter, forest plot) fix_all_figures_qa.py # Generate Figures 1-5 with QA fixes (all other figures) paper/ main.tex # LaTeX source references.bib # Bibliography (26 entries) main.pdf # Compiled paper (17 pages) pip install numpy scipy matplotlib For LaTeX compilation: TinyTeX or TeX Live with natbib , booktabs , hyperref . cd code python3 generate_figures_v3.py # Figures 6-7 python3 fix_all_figures_qa.py # Figures 1-5 - ARIA's operational memory corpus (contains proprietary and personal information) - Raw model responses (available upon request for verification) - V6 judging prompt text (included in the paper's appendix description; full text available upon request) All models were called with default API parameters (no explicit reasoning-effort or thinking-mode overrides). | Model | Vendor | Parameters | |---|---|---| | Claude Opus 4.6 | Anthropic | Default API parameters | | Claude Sonnet 4.6 | Anthropic | Default API parameters | | GPT-5.4 | OpenAI | Default API parameters | | Gemini 3.1 Pro | Default API parameters | | | Codex 5.3 | OpenAI | Default API parameters | @article{nuraliev2026experience, title={Operational Experience as a Performance Multiplier in AI Assistants: A Controlled Study with Triple-Judge Blind Evaluation}, author={Nuraliev, Ravshan and Rychkova, Anastasia}, year={2026}, note={PATech Labs} } MIT License. See data files for individual question source attributions. Ravshan Nuraliev, Anastasia Rychkova — [email protected]
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유