GPT의 GPT: 새로운 마이크로GPT
hackernews
|
|
📦 오픈소스
#ai연구
#chatgpt
#gpt
#마이크로gpt
#머신러닝
#머신러닝/연구
#자동생성
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
안드레이 카르파시의 인터뷰에서 시작된 이 프로젝트는 LLM이 직접 'microgpt'와 같은 핵심 알고리즘을 구현할 수 있는지 확인하기 위해 기획되었습니다. 저자는 단 한 줄의 코드도 직접 작성하지 않고, 프롬프트를 통해 GPT 트랜스포머의 훈련 및 추론 요소를 담은 단일 파일 구현체를 생성하는 실험을 진행했습니다. 그 결과 Python의 Autograd 버전과 Julia의 행렬 미적분학 버전을 성공적으로 만들어냈으며, 이는 200줄 내외로 구현된 간결한 코드입니다. 이 실험은 대규모 코드베이스가 아닌, 교육적인 목적으로 추상화를 제거한 핵심 알고리즘의 구현이 가능함을 보여줍니다.
본문
Live three-column view (interactive, with display settings): https://entrpi.github.io/microgpt-denovo/ This readme is the only hand-written file in this repo. I started this project after watching Andrej Karpathy's recent interview on No Priors where he explained that he had to hand-write microgpt , a 200-line GPT implementation in Python which distills the essence of all the algorithms behind creating Transformers, because the LLMs he asked weren't able to do it. I wanted to test if this is still true: whether a "microgpt" in that spirit could be brought into existence with minimal manual intervention, just clear expression of intent to an LLM. This is an experiment not just in producing a tiny GPT artifact, but in seeing how close you can get to the essence of microgpt just through careful prompting, without writing a single line yourself. Most Transformer codebases are optimized for scale, performance, flexibility, or framework ergonomics. Those are worthwhile goals, but they often hide the actual learning algorithm behind layers of abstraction. What microgpt aims for is different: a single-file presentation of the core algorithmic elements of training and inferencing a GPT-style decoder, written as tersely as possible without losing the thread of the ideas. This task means every major mechanism should appear directly on the page: tokenization, embeddings, causal self-attention, residual connections, normalization, MLP, cross-entropy, backpropagation, Adam, and autoregressive sampling. If a detail can be derived from more basic choices, I want it derived. If a symbol can be explained in plain language, I want it explained. If a convenience abstraction hides the algorithm, I want to remove it. You can see in the dialog below my very-first-attempt conversation with ChatGPT 5.4 xhigh in Codex App on Mar 23, 2026 which produced the files below: microgpt.py : a validated Python implementation with a from-scratch scalar autograd engine and explicit chain-rule edges.microgpt_matrix.jl : a validated Julia implementation of the matrix-calculus version with explicit forward and backward passes. I was inspired to go in the matrix math direction for speed when writing my own EEmicroGPT, which is the fastest implementation of microgpt , but does not aim for brevity or clarity. I was inspired to make that brief and clear too after seeing ssrhaso/microjpt. Now the closest thing to "code" contributed to the files is just the inital constants, data, and restraints to give the model the same shape as Karpathy's microgpt. The rest was left to the agent, with further guidance being directed at elements of style and structure at a high level. The following dialog produced the current artifacts. User: Hello. In this directory our project is to implement the core algorithmic elements of training and inferencing a GPT Transformer model in a terse but pedagogically motivated manner. That means we want to distill just the essential elements of the algorithms into as few well-documented lines of code as possible. It should be a single file with the complete algorithm. We will train against a list of names: https://raw.githubusercontent.com/karpathy/makemore/988aa59/names.txt Tokenize each unique char in dataset. We will use Autograd, a tiny model (1 layer, 16 wide embedding, blocks of 16, 4 heads. Follow GPT-2 with minor differenes: layernorm -> rmsnorm, no biases, GeLU -> ReLU Use Adam optimizer. Train 1000 steps. Run inference at 0.5 temperature It's possible to do this in real code, including comment lines and a generous header block in between 80 and 200 lines depending on the language used. Try with pseudocode first. Assistant: I added a single-file pseudocode draft at `microgpt_pseudocode.py`. It’s 158 lines and includes the full path end to end: download/tokenize the names corpus, initialize a 1-layer 16-wide 4-head decoder-only GPT, train with Adam for 1000 steps, and sample at temperature `0.5`. The main modeling choice is to train on the raw newline-delimited character stream, so `'\n'` becomes a real token and doubles as the boundary token for sampling. That keeps the code short and avoids introducing extra BOS/PAD machinery while still matching the dataset cleanly. I did not run it as code; it is intentionally Python-shaped pseudocode, not yet a verified executable Autograd implementation. If you want, I can turn this directly into the real single-file Autograd version next. User: Now instead of Autograd what about pure matrix math? Don't constrain pseudocode to what Python can do. Perhaps the abstractions of Julia or other more mathematically oriented languages would give a cleaner pseudocode basis. It should be possible to express in under 100 clear (pedagogical, not minified) lines. Assistant: Yes. A Julia-like matrix-calculus pseudocode basis is cleaner for this teaching goal because it lets the file talk in terms of tensor primitives instead of Python/autograd mechanics. I added a 90-line version at `microgpt_matr
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유