OpenAI Parameter Golf: 가능한 최고의 LLM을 16MB 아티팩트에 맞추세요

hackernews | | 📦 오픈소스
#ai 경진대회 #ai 모델 #chatgpt #llm #openai #모델 압축 #파라미터 골프
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

OpenAI가 '파라미터 골프(Parameter Golf)' 챌린지를 출시하여, 16MB 크기 제한과 10분 훈련 시간 내에 최고의 언어 모델을 만드는 참가자들을 모집하고 있습니다. FineWeb 검증 데이터셋을 기준으로 압축 성능을 평가하며, 100만 달러 규모의 컴퓨팅 크레딧을 지원하고 우수 참가자는 채용 기회를 제공합니다. 이번 대회는 파라미터 제약 환경에서 참가자들이 독창적인 아키텍처나 압축 기법을 통해 효율성을 극대화할 수 있는지를 시험하는 것을 목표로 합니다.

본문

OpenAI Model Craft Challenge: Parameter Golf is a challenge to train the best language model that fits in a 16MB artifact and trains in under 10 minutes on 8xH100s, evaluated by compression on the FineWeb validation set (tokenizer-agnostic, bits per byte). This challenge is heavily inspired by the NanoGPT Speedrunning challenge, where participants compete to train a model that reaches 3.28 FineWeb validation loss as quickly as possible. We're excited to see how optimizing for a parameter-constrained setting pushes people toward unique architectures (test-time compute, aggressive parameter tying, depth recurrence, low-rank training, ...), compression schemes (low precision, QAT, bitnets, novel tokenizers, ...), and other creative submissions (test-time training, long context, megakernels ...). If you're familiar with neural scaling laws, you can consider this challenge a form of L(N) optimization, where the objective is to optimize the lowest loss given a fixed number of parameters (N) unconstrained by data, compute, steps, or architecture. Challenges like the NanoGPT Speedrun, which optimizes for a form of L(T) (~lowest time given constrained loss) or the NanoGPT Slowrun, which optimizes for L(D) (lowest loss given constrained dataset size), can be thought of as equivalent challenges in this family. Ideally, we'd allow for submissions to use arbitrary computational resources. But in order to make the challenge not inaccessibly expensive, we're limiting leaderboard submissions to 10 minutes on 8xH100s. However, we'd still love to see submissions that don't meet the compute limitation requirements in our 'Non-record Submissions' section: We're excited to see people push the infinite frontier of parameter limited performance as well. We also know compute is expensive, so OpenAI is sponsoring $1,000,000 in compute credits to help people get started training their models. To request a compute grant, use this form: Request a Compute Grant. When requesting compute, please make sure you choose the appropriate level, write sufficient justification, and submit with an email tied to a OpenAI / ChatGPT account. If you enjoy solving very difficult technical problems, please introduce yourself via the Challenge Participant Form. It helps us attribute challenge submissions and reach out about opportunities with OpenAI. Completing the form is not required to participate. Many researchers at OpenAI first distinguished themselves through elite mathematics and programming competitions. The Model Craft Challenge is designed in that spirit: testing the ability to tackle unfamiliar problems with creativity and rigor, qualities we believe are essential for frontier AI research. In June, we plan to hire a small cohort of early-career researchers, targeting current undergraduate students and recent graduates, including Olympiad medalists and elite competitors. For exceptional participants, the challenge may also serve as a way to stand out to OpenAI researchers and recruiters. The challenge runs from March 18th to April 30th. Happy training! | Run | Score | Author | Summary | Date | Info | |---|---|---|---|---|---| | 11L AR Self-Gen GPTQ + XSA | 1.1147 | abaybektursun | On PR #1019: Self-Generated GPTQ Calibration Data + all-layer XSA on the PR #549 stack | 2026-03-25 | info | | LeakyReLU² + Legal Score-First TTT + Parallel Muon | 1.1194 | abaybektursun | On PR #549: LeakyReLU(0.5)^2 + TTT + Parallel Muon on the PR #414 stack | 2026-03-23 | info | | 11L EMA + GPTQ-lite + warmdown3500 | 1.1228 | signalrush | On PR #374: GPTQ-lite clip search + EMA, plus warmdown3500 and [email protected] | 2026-03-22 | info | | 11L Partial RoPE + LN Scale + EMA + XSA4 | 1.1248 | jfprincz | On PR #287: Partial RoPE (16/64) + layerwise LN scale | 2026-03-21 | info | | 11L XSA4 + EMA + Int6 MLP3x | 1.1271 | jfprincz | On PR #198: XSA on the last 4 layers + EMA replacing SWA | 2026-03-20 | info | | 11L Efficient Partial XSA | 1.1307 | unnir | On PR #198: Efficient Partial XSA on the deepest 3 layers | 2026-03-20 | info | | 10L Int5-MLP + BigramHash(10240) | 1.1428 | thwu1 | 10 layers, mixed int5/int6 quantization, BigramHash(10240), SWA(0.4), WD=0.04 | 2026-03-20 | info | | Int6 MLP3x + SmearGate + BigramHash | 1.1458 | Raahil Shah | 3x MLP + SmearGate + BigramHash + OrthoInit + Muon WD + SWA | 2026-03-20 | info | | 11L MLP3x + Int6 QAT | 1.1502 | aruniyer | 11 layers, 3x MLP, int6 QAT, zstd-22, WD=0.04, sliding eval | 2026-03-20 | info | | SmearGate + OrthoInit + Muon WD | 1.1556 | aquariouseworkman | SmearGate + BigramHash + 3x MLP + int6 STE QAT + sliding eval | 2026-03-19 | info | | Ternary Quantization | 1.1570 | Ciprian-Florin Ifrim | 73.7M params quantized to 1 0 -1 + misc arch changes | 2026-03-24 | info | | 10L Int6 QAT + Zstd MLP2.6x | 1.1586 | yahya010 | 10 layers, int6 QAT + zstd-22, MLP 1344, Muon 0.99, sliding eval | 2026-03-19 | info | | Mixed Quant + Sliding Window Eval | 1.1630 | aquariouseworkman | Int6 block weights + int8 embeddings + 3x MLP

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →