Grok 4.20은 Grok-4.1-fast를 최소한으로 개선했습니다.

hackernews | 2026년 3월 13일 05:17 | 🔬 연구

#ai benchy #gemini #grok 4.20 #grok-4.1-fast #review #벤치마크 #비교 리뷰

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

1. **Analyze the Request:** * **Role:** News summarization expert. * **Task:** Summarize the provided article. * **Constraints:** Concise, 2-4 sentences, focus on important info, no copying/pasting, no restating the title, add new info beyond the title. * **Input:** Title ("Grok 4.20 brings minimal improvements for Grok-4.1-fast") and "Article Text" (which is just the title repeated, or essentially non-existent/empty in terms of actual content). 2. **Analyze the Input Data:** * *Title:* Grok 4.20 brings minimal improvements for Grok-4.1-fast * *Article Text:* "Grok 4.20 brings minimal improvements for Grok-4.1-fast" (This looks like a truncated input or a very short Hackernews blurb). 3. **Identify the Problem:** The user provided the Title and the Article Text, but the Article Text is identical to the Title. There is *no* additional information in the text to extract. * Constraint Check: "Add new information beyond the title." * Constraint Check: "DO NOT simply restate the title." 4. **Evaluate Options:** * *Option A:* Hallucinate/Invent details about Grok 4.20 (e.g., "It features better latency and accuracy..."). -> *Risk:* This violates the core principle of summarization based *only* on the provided text. * *Option B:* State that the text is insufficient. -> *Risk:* Not a summary. * *Option C:* Summarize the *metadata* or the *implication* of the title since the text is empty, acknowledging the version update. -> *Risk:* Still hard to add "new

본문

AI BENCHY Compare Compared models Last updated at: 2026-03-30 | Metric | Grok 4.20 Beta Grok 4.20 Beta medium | Grok 4.20 Multi-Agent Beta Grok 4.20 Multi-Agent Beta medium | Grok 4.1 Fast Grok 4.1 Fast medium | Gemini 3 Flash Preview Gemini 3 Flash Preview medium | |---|---|---|---|---| | Score | 7.9 | 6.2 | 6.9 | 10.0 | | Rank | #24 | #48 | #39 | #1 | | Consistency | 9.0 | 7.2 | 7.5 | 10.0 | | Tests Correct | |||| | Attempt pass rate | 72.6% | 54.9% | 66.7% | 100.0% | | Flaky tests | 2 | 6 | 5 | 0 | | Total Runs | 51 | 51 | 51 | 51 | | Cost per result | 5.525 | 82.962 | 0.568 | 0.972 | | Total Cost | $0.608 | $4.978 | $0.052 | $0.166 | | Input Price | $2.000 / 1M | $2.000 / 1M | $0.200 / 1M | $0.500 / 1M | | Output Price | $6.000 / 1M | $6.000 / 1M | $0.500 / 1M | $3.000 / 1M | | Output Tokens | 1,487 | 298,948 | 1,189 | 1,640 | | Reasoning Tokens | 87,922 | 296,529 | 84,595 | 48,270 | | Response Time (avg) | 8.54s | 8.64s | 23.91s | 11.39s | | Response Time (max) | 24.21s | 35.28s | 121.79s | 50.16s | | Response Time (total) | 145.26s | 129.64s | 239.09s | 113.86s | Score vs Total Cost Response Time (avg) Score vs Response Time (avg) Total Output Tokens Score vs Total Output Tokens Category Breakdown | Anti-AI Tricks | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens | |---|---|---|---|---|---|---|---|---| | Grok 4.20 Beta | 8.7 | 7.9 | 91.7% | 1 | 3.16s | 268 | 7,583 | | | Grok 4.20 Multi-Agent Beta | 6.9 | 5.8 | 75.0% | 2 | 3.46s | 33,706 | 33,077 | | | Grok 4.1 Fast | 8.7 | 7.9 | 91.7% | 1 | 3.81s | 108 | 4,741 | | | Gemini 3 Flash Preview | 10.0 | 10.0 | 100.0% | 0 | 4.13s | 305 | 3,490 | | Combined | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens | |---|---|---|---|---|---|---|---|---| | Grok 4.20 Beta | 10.0 | 10.0 | 100.0% | 0 | 20.93s | 227 | 12,212 | | | Grok 4.20 Multi-Agent Beta | 3.0 | 10.0 | 0.0% | 0 | 0ms | 0 | 0 | | | Grok 4.1 Fast | 10.0 | 10.0 | 100.0% | 0 | 37.64s | 261 | 12,272 | | | Gemini 3 Flash Preview | 10.0 | 10.0 | 100.0% | 0 | 50.16s | 351 | 12,645 | | Data parsing and extraction | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens | |---|---|---|---|---|---|---|---|---| | Grok 4.20 Beta | 10.0 | 10.0 | 100.0% | 0 | 4.01s | 180 | 5,281 | | | Grok 4.20 Multi-Agent Beta | 10.0 | 10.0 | 100.0% | 0 | 5.54s | 25,306 | 25,051 | | | Grok 4.1 Fast | 10.0 | 10.0 | 100.0% | 0 | 6.63s | 180 | 5,409 | | | Gemini 3 Flash Preview | 10.0 | 10.0 | 100.0% | 0 | 4.72s | 279 | 5,333 | | Domain specific | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens | |---|---|---|---|---|---|---|---|---| | Grok 4.20 Beta | 5.3 | 10.0 | 33.3% | 0 | 21.33s | 251 | 40,255 | | | Grok 4.20 Multi-Agent Beta | 2.9 | 7.2 | 11.1% | 1 | 24.67s | 164,609 | 163,647 | | | Grok 4.1 Fast | 5.8 | 4.4 | 66.7% | 2 | 121.79s | 11 | 37,657 | | | Gemini 3 Flash Preview | 10.0 | 10.0 | 100.0% | 0 | 21.12s | 12 | 14,908 | | General Intelligence | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens | |---|---|---|---|---|---|---|---|---| | Grok 4.20 Beta | 10.0 | 10.0 | 100.0% | 0 | 5.78s | 72 | 3,440 | | | Grok 4.20 Multi-Agent Beta | 5.8 | 2.8 | 66.7% | 1 | 6.40s | 15,848 | 15,746 | | | Grok 4.1 Fast | 4.2 | 9.9 | 0.0% | 0 | 16.25s | 127 | 3,456 | | | Gemini 3 Flash Preview | 10.0 | 10.0 | 100.0% | 0 | 4.09s | 111 | 1,285 | | Instructions following | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens | |---|---|---|---|---|---|---|---|---| | Grok 4.20 Beta | 8.3 | 10.0 | 50.0% | 0 | 4.97s | 57 | 7,107 | | | Grok 4.20 Multi-Agent Beta | 8.3 | 10.0 | 50.0% | 0 | 4.63s | 25,457 | 25,322 | | | Grok 4.1 Fast | 6.6 | 10.0 | 50.0% | 0 | 5.30s | 55 | 3,489 | | | Gemini 3 Flash Preview | 10.0 | 10.0 | 100.0% | 0 | 6.10s | 72 | 4,558 | | Puzzle Solving | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens | |---|---|---|---|---|---|---|---|---| | Grok 4.20 Beta | 8.2 | 7.2 | 88.9% | 1 | 3.85s | 249 | 6,660 | | | Grok 4.20 Multi-Agent Beta | 7.2 | 5.1 | 77.8% | 2 | 5.01s | 34,022 | 33,686 | | | Grok 4.1 Fast | 5.3 | 7.2 | 44.4% | 1 | 8.08s | 187 | 6,086 | | | Gemini 3 Flash Preview | 10.0 | 10.0 | 100.0% | 0 | 4.43s | 276 | 4,921 | | Tool Calling | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens | |---|---|---|---|---|---|---|---|---| | Grok 4.20 Beta | 3.0 | 10.0 | 0.0% | 0 | 12.39s | 183 | 5,384 | | | Grok 4.20 Multi-Agent Beta | 3.0 | 10.0 | 0.0% | 0 | 0ms | 0 | 0 | | | Grok 4.1 Fast | 2.8 | 1.6 | 33.3% | 1 | 27.71s | 260 | 11,485 |

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기