FrontierSWE – 장기적인 코딩 작업을 위한 벤치마크
hackernews
|
|
📦 오픈소스
#오픈소스
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
FrontierSWE는 학계 및 산업계 파트너와 협력하여 개발된 초장기 코딩 에이전트 벤치마크로, 성능 엔지니어링과 ML 연구 등 실제 문제 해결 능력을 평가합니다. 최신 모델의 성과를 분석한 결과는 리더보드와 블로그를 통해 확인할 수 있으며, 이를 통해 실무 수준의 구현 및 연구 역량을 검증합니다.
본문
Skip to content You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert Proximal-Labs / frontier-swe Public Notifications You must be signed in to change notification settings Fork 3 Star 87 main Branches Tags Go to file Code Open more actions menu Folders and files Name Name Last commit message Last commit date Latest commit History 390 Commits 390 Commits docker/ first_party_cli docker/ first_party_cli harbor_ext harbor_ext tasks tasks .gitignore .gitignore README.md README.md pyproject.toml pyproject.toml uv.lock uv.lock View all files Repository files navigation FrontierSWE FrontierSWE is an effort to test coding agents on the hardest ultra-long horizon technical challenges. Together with partners from academia and industry, we have collected real-world problems from domains including performance engineering, computational science, and ML research, and evaluated how well frontier models can perform on them. See the leaderboard and blog for results and analysis. FrontierSWE is also available as a Prime Intellect Environment . About FrontierSWE is an ultra long-horizon coding agent benchmark that tests implementation, performance eng and ML research frontierswe.com Resources Readme Uh oh! There was an error while loading. Please reload this page . Activity Custom properties Stars 87 stars Watchers 0 watching Forks 3 forks Report repository Releases No releases published Packages 0 Uh oh! There was an error while loading. Please reload this page . Contributors Uh oh! There was an error while loading. Please reload this page . Languages C 35.6% Dart 30.9% JavaScript 19.5% Python 4.2% Perl 3.3% Tcl 2.6% Other 3.9% You can’t perform that action at this time.
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유