OpenMythos: A looped transformer take on how Claude Mythos might work
hackernews
|
|
{'이벤트': '📰', '머신러닝/연구': '📰', '하드웨어/반도체': '📰', '취약점/보안': '📰', '기타 AI': '📰', 'AI 딜': '📰', 'AI 모델': '📰', 'AI 서비스': '📰', 'discount': '📰', 'news': '📰', 'review': '📰', 'tip': '📰'} AI 모델
#anthropic
#claude
#claude mythos
#llama
#machine learning
#mistral
#openai
#openmythos
#transformer
요약
클로드 미토스의 작동 원리가 공개되지 않자 개발자 케 고메스가 관련 논문과 동작 패턴을 분석해 직접 유사 모델을 구축했습니다. 그는 순환 트랜스포머와 루프 아키텍처 등을 바탕으로 만든 이 모델을 MIT 라이선스로 공개했으며, 코드는 설치가 가능한 상태입니다.
왜 중요한가
관련 엔티티
오픈미토스
클로드 미토스
앤스로픽
카이 고메즈
루프형 변압기
매개변수 효율성
본문
Anthropic hasn’t told anyone how Claude Mythos works. No architecture paper or model card with details. Just a product that keeps surprising people and a company that stays quiet about why. That silence has been driving the research community a little crazy. So one developer, Kye Gomez did something about it. He read every public paper he could find on recurrent transformers, looped architectures, and inference-time scaling. He studied the behavioral patterns people were reporting from Mythos. Then he built what he thinks is inside it, published the code under MIT, and made it pip installable. It’s called OpenMythos. It is not Claude Mythos. Gomez is explicit about that but the hypothesis behind it is serious, the architecture is real, and the reasoning for why Mythos might work this way is harder to dismiss than you’d expect. Table of Contents What OpenMythos actually is Most open source model releases give you weights. OpenMythos gives you a blueprint. No pretrained weights exist yet. What Gomez published is the full architecture he believes Mythos is built on, a training script to actually build it yourself, and seven size options from 1B to 1T parameters. You pick your scale, point it at your data, and train it. The pip install takes seconds. The training takes considerably longer. What’s sitting inside that blueprint is where things get genuinely interesting and to understand it, you need to understand one design decision that separates this from every other open model you’ve probably heard of. The architecture theory Every model you’ve used before like Llama, Gemma, Mistral, whatever, stacks layers. Hundreds of them, each running once, passing results to the next one down the line. More layers means smarter model, but also bigger, heavier, more expensive to run. Gomez’s theory is that Mythos doesn’t stack. It loops. Instead of hundreds of unique layers each running once, a small set of layers runs through the same computation multiple times before the model produces any output. Same weights, repeated passes, progressively deeper reasoning without the parameter explosion that usually comes with depth. Think of it like drafting an answer in your head. First pass you get the rough shape. Second pass you catch what you missed. Third pass you refine. By the time you speak, you’ve already worked through several versions internally. Nobody watching saw any of that, they just got the final answer. That’s roughly what’s happening here. Each loop updates the model’s internal state, building on the previous pass. The original input gets re-injected at every loop so the model stays anchored to what you actually asked without that, it would drift. After enough passes it produces output. All the intermediate work happened silently, never becoming visible tokens. This is why the theory fits Mythos behavior so well. Mythos consistently handles hard multi-step problems without showing its work by default. A looped architecture would do exactly that, the reasoning lives inside the loops, not in the output stream. There’s a practical upside too. A model that reasons through looping can be dramatically more parameter-efficient than one that reasons through sheer layer depth. You get deeper thinking without paying for it in model size. The catch is that looped models are historically painful to train and the internal state can spiral out of control across iterations. OpenMythos implements a fix from recent research that constrains the architecture so stability is guaranteed by design, not by luck. The repo even prints a stability check at runtime so you can verify it’s behaving. Why this might actually explain Mythos To be clear, this is speculation. Educated, well-researched speculation, but nobody outside Anthropic actually knows. That said, four things about Mythos behavior map oddly well onto this theory. Mythos handles problems it’s never seen before better than models of comparable size. Looped transformers are specifically good at this, the capability doesn’t emerge gradually, it phase-transitions in after enough training. Mythos also handles deeply compositional problems like ten-step math, long arguments, multi-layer code without explicit chain-of-thought. More loops at inference means deeper reasoning chains, which is exactly the mechanism a looped model would use. The reasoning also happens silently, in continuous space, which matches how Mythos behaves when it’s not in extended thinking mode. And the parameter efficiency story fits a model that reasons through looping needs far fewer parameters to achieve the same depth as a stacked architecture. None of this proves anything. It’s a theory that fits the observed behavior. Which is exactly what makes OpenMythos interesting to follow. What you can run today Seven model scales ship with the repo, 1B through 1T each preconfigured so you’re not tuning architecture by hand. The 1B and 3B variants are realistic on consumer hardware. Anything above 50B needs a proper cluster. The training script for the 3B on FineWeb-Edu is included and works single GPU or multi-GPU out of the box via torchrun. The tokenizer uses OpenAI’s gpt-oss-20b. Training runs in bfloat16 on modern GPUs, float16 with gradient scaling on older ones. Attention is your choice, MLA or GQA, set in config before you initialize. MLA is closer to what DeepSeek uses and is more parameter efficient. GQA is simpler and better supported across inference engines. There are no pretrained weights to download. You’re training from scratch. That’s where this project is today. Is this for you? If you research transformer architectures or study inference-time scaling, clone the repo tonight. The Parcae stability implementation alone is worth reading through. If you build on open models and keep hitting a ceiling on complex reasoning tasks, this gives you a genuinely different architectural direction to experiment with. And if you’re just someone who finds it fascinating that a developer sat down, read every public paper he could find, and tried to reconstruct one of the most capable closed models in existence, that’s reason enough to bookmark this one. The weights don’t exist yet. The theory might be wrong. But the code is real, the license is clean, and the question it’s asking is one Anthropic still hasn’t answered.