HN 표시: 언어 모델의 "기본 계층" 매핑 실험

hackernews | 2026년 3월 15일 23:51 | 📦 오픈소스

#ai 모델 #llama #ollama #qwen #벤치마크 #언어 모델

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

해당 연구는 언어 모델 내부에 존재하는 '원시적 의미 층(Primitive Layer)'을 발견하고, 이것이 인간 언어의 근원 개념과 유사함을 입증했습니다. 4개의 소형 모델 아키텍처를 대상으로 한 실험 결과, FEAR나 KNOW와 같은 원시 개념들이 측정 가능한 활성화 패턴으로 나타나며, 이들의 조합을 통해 더 복잡한 개념을 예측할 수 있음이 확인되었습니다. 또한 연구진은 기호적 추론 구조를 사용할 때 0.5B 파라미터 모델도 자연어 프롬프트 실패를 극복하고 추론 능력을 크게 향상시킬 수 있다는 사실을 GOG 및 SRM 아키텍처를 통해 입증했습니다.

본문

pip install requests ollama pull qwen2.5:0.5b python3 -c " from sel.core.router import process print(process('I miss my hometown')) " Active research. Benchmarks are reproducible, results are real, and the research is ongoing. Contributions, challenges, and replications welcome via issues or pull requests. This repository documents two connected research programs. GOG (Graph-Oriented Generation) replaces probabilistic vector retrieval with deterministic graph traversal over a codebase's actual import dependency structure. It is complete, benchmarked, and validated. The paper is available here: GOG_PAPER.pdf SRM (Symbolic Reasoning Membrane) is the deeper investigation that GOG made possible. It asks: if structure can control a language model's output reliably, what is the minimum structure required? What are the atomic units of meaning that a language model recognizes, responds to, and can combine into richer concepts? Eighteen experiments later, that question has a partial and surprising answer. The full research paper is here: SRM_PAPER.md Across four language model architectures — Qwen 2.5 (0.5B), Gemma 3 (1B), LLaMA 3.2 (1B), and SmolLM2 (360M) — we found consistent empirical evidence for a primitive layer underlying language model behavior. Anna Wierzbicka proposed in the 1970s that all human languages share approximately 65 irreducible semantic concepts — WANT, KNOW, FEEL, GOOD, BAD, DO, HAPPEN — from which all other meaning is constructed. Cowen and Keltner identified 27 universal emotional states in 2017. We tested whether these primitives appear as measurable activation patterns in small language models. They do. Specifically: The Layer 0a/0b distinction is real and architecture-independent. Scaffolding primitives — SOMEONE, TIME, PLACE — produce abstract, relational responses. Content primitives — FEAR, GRIEF, JOY, ANGER, RELIEF, NOSTALGIA — produce phenomenological, embodied responses. The activation gap between these two classes averaged +0.245 across all four models. The direction was consistent in every model tested. Primitive composition produces predictable Layer 1 concepts. Eleven operator-seed combinations matched pre-registered predictions in three out of four model architectures: | Combination | Predicted | Validated | |---|---|---| | KNOW + FEAR | dread / awareness | ✓ 3/4 models | | FEEL + GRIEF | heartbreak / sorrow | ✓ 3/4 models | | WANT + FEAR | anxiety / avoidance | ✓ 3/4 models | | WANT + ANGER | ambition / revenge | ✓ 3/4 models | | TIME + GRIEF | mourning / melancholy | ✓ 3/4 models | | TIME + NOSTALGIA | memory / reminiscence | ✓ 3/4 models | | TIME + RELIEF | healing / recovery | ✓ 3/4 models | | WANT + GRIEF | longing / yearning | ✓ 3/4 models | | WANT + NOSTALGIA | longing / regret | ✓ 3/4 models | | FEEL + JOY | delight / bliss | ✓ 3/4 models | | KNOW + NOSTALGIA | wisdom / reflection | ✓ 3/4 models | The scaling pattern has an implication. The primitive activation gap is largest in the smallest model and narrows as model size increases — not because content primitives weaken, but because larger models develop richer phenomenological access to scaffolding primitives too. As language models scale, they appear to converge toward a more coherent internal representation of the primitive layer. This may partly explain why larger models reason better — they are closer to the atoms of meaning. The SRM proposes a three-layer architecture: Symbolic Reasoning Layer (pure code — no LLM) ↓ generates primitive combinations ↓ Structure Membrane (small LLM, role-conditioned) ↓ translates structure into language ↓ Language Output (English or any modality) Structure is deterministic. Language is emergent. The membrane carries structure into the language space and lets emergence do the rest. This is not a trained system. It is a theoretical architecture grounded in eighteen empirical experiments. Building it is the next phase. GOG was the first indication that something deeper was possible. It demonstrated that replacing natural language prompts with deterministic symbolic specifications dramatically improves correctness in small language models — a 0.5B parameter model that fails completely on a reasoning task with a natural language prompt succeeds completely with a symbolic spec. The key result: | Tier | Input | Correctness | Time | |---|---|---|---| | RAG | 53,137-token corpus + raw prompt | FAIL 2/5 | 5.71s | | GOG | 6,323-token context + raw prompt | PARTIAL 4/5 | 11.63s | | SRM | 6,323-token context + symbolic spec | PASS 5/5 | 0.94s | The model did not fail because it could not write correct code. It failed because it could not reason about what to write. When the reasoning was done externally and passed in as structure, the language capability was sufficient. GOG is complete and documented. The full paper, benchmark code, and reproduction instructions are in /gog . / ├── README.md this file ├── GOG_PAPER.pdf GOG research paper ├── SRM_RESEARCH.md SRM research paper ├── /gog GOG ben

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기