신화를 외치는 소년: 검증은 인류에 대한 신뢰를 무너뜨리고 있다

hackernews | | {'이벤트': '📰', '머신러닝/연구': '📰', '하드웨어/반도체': '📰', '취약점/보안': '📰', '기타 AI': '📰', 'AI 딜': '📰', 'AI 모델': '📰', 'AI 서비스': '📰', 'discount': '📰', 'news': '📰', 'review': '📰', 'tip': '📰'} review
#anthropic #claude #review

요약

이 기사는 Anthropic의 Claude Mythos Preview 모델에 대한 비판적인 분석입니다. 기사의 핵심 내용을 파악하고 2-4문장으로 요약해야 합니다. 주요 내용: 1. Anthropic은 Claude Mythos Preview가 "수천 개의 제로데이 취약점"을 발견할 수 있다고 주장했지만, 244페이지 시스템 카드에서 "수천"이라는 단어는 취약점 설명에 한 번도 사용되지 않음 2. Firefox 테스트는 실제 Firefox가 아닌 SpiderMonkey JavaScript 엔진 쉘로, 브라우저 샌드박스와 방어 미터기 제거된 상태 3. 버그들은 Mythos가 아닌 Claude Opus 4.6이 먼저 발견한 것이며, Firefox 148에서 이미 패치됨 4. 상위 2개 버그를 제외하면 Mythos의 성공률이 72.4%에서 4.4%로 급락 - 이는 Sonnet 4.6과 차이가 없음 5. AISLE의 실험에서 3.6B 파라미터 오픈소스 모델도 같은 버그를 찾아냈음 (100만 토큰당 $0.11) 6. "$100million 방어 이니셔티브"는 실제 현금 $4M과 사용 크레딧 $100M 7. 파트너들(MS, Google, CrowdStrike 등)은 기술적 확인 없이 브랜드 어소시에이션만 제공 8. Tom's Hardware 분석: "수천"이라는 수치는 약 198개의 수동 검토 결과에 기반 AI 보안 모험의 허구성을 드러내는 이 분석은 Anthropic의 주장과 실제 기술적 증거 사이의 엄청난 간극을 보여줍니다. 연구진은 광범위한 검증 없이 과장된 수치를 제시하고, 실질적인 기술적 뒷받침 없이 파트너십과 자금 동원을 시도하고 있습니다. Anthropic이 공개한 244페이지 시스템 카드에서 실제 보안 관련 내용은 단 7페이지에 불과하며, "수천 개의 제로데이"라는 표현은 기술 문서가 아닌 홍보 자료에서만 등장한다. 중심 사례인 Firefox 테스트는 실제 브라우저가 아닌 샌드박스와 방어 미터기가 제거된 엔진 셸에서 진행됐고, 버그들은 Mythos가 아닌 기존 모델이 발견했으며, 이미 패치된 상태였다. 상위 2개 버그를 제외하면 성공률이 72.4%에서 4.4%로 급락하여 이전 모델과 차이가 없어진다. 독립 실험에서 36억 파라미터 오픈소스 모델(100만 토큰당 0.11달러)도 같은 취약점을 찾아냈으며, 파트너사들은 구체적 기술 확인 없이 브랜드 지원만 제공했다.

왜 중요한가

본문

I’ve been getting more and more curious about the risk from Anthropic’s Claude Mythos Preview. So I pulled the system card, a whoppingly inefficient 244-page document that devotes just seven pages to the claim that the model is too dangerous to release. In fact, the 23MB of PDF I had to download was 20MB of wasted time and space. Compressing the PDF to 3MB meant I lost exactly nothing. Foreshadowing, I guess. Spoiler alert: the crucial seven pages out of 244 do not contain the word “fuzzer” once. That’s like a seven page vacation brochure for Hawaii that leaves out the word beaches. Also, the crucial seven pages out of 244 do not contain the expected acronyms CVSS, CWE or CVE, they do not have comparison baseline, an independent reproduction, or the word “thousands.” I’ll get back to all of that in a minute. The flagship demonstration document turns out to be like the ending of the Wizard of Oz, a sorry disappointment about a model weaponizing two bugs that a different model found, in software the vendor had already patched, in a test environment with the browser sandbox and defense-in-depth mitigations stripped out. Anthropic failed, and somehow the story was flipped into a warning about its success. Whomp. Whomp. Sad trombone. No Glasswing partner has confirmed a single specific finding. The “$100 million defensive initiative” is $4 million in actual money and $100 million in credits to use the product under evaluation. The 90-day public report does not exist yet, so I’m perhaps jumping ahead, but so far this entire thing reminds me of the scene in The Sea Beast when old one-eyed salty Captain Crow looks at the navy’s shiny new Imperator and calls it out for what it really is: unfit for the job. The supposedly huge Anthropic “step change” appears to be little more than a rounding error. The threat narrative so far appears to be ALL marketing and no real results. The Glasswing consortium is regulatory capture dressed up poorly as restraint. Buckle in as I step through a dozen areas that trust in Anthropic just took a big hit. 1. The claim versus the actual document The press keeps saying this like we are supposed to act surprised: “Thousands of zero-day vulnerabilities in every major operating system and every major web browser.” Yeah, that sounds like a Tuesday to me. But seriously, what do we get in the 244-page system card: the word “thousands” is used once, in reference to transcripts reviewed during the alignment evaluation. Once in 244 pages. Think about that. It is never used to describe vulnerabilities. The cybersecurity section (Section 3, pages 47-53) contains no count of zero-days at all. With no CVE list, no CVSS distribution, no severity bucket, no disclosure timeline, no vendor-confirmed-novel table, no false-positive rate, why are you teasing us with the claims about vulnerabilities at all? The “thousands” number lives in the red.anthropic.com launch blog post and the Project Glasswing announcement. The 244-page technical artifact, the thing that would have to survive peer review, refuses to actually quantify. And when you claim mass vulnerabilities that you also don’t quantify, that’s a big NO in trust. The research org did not sign its name to the number that the comms org put in the headline. That’s a BIG problem. The ratio alone is enough to spit my coffee all over my keyboard. Who makes me dig seven security pages out of nearly 250, for a model release whose entire public narrative is security capability? Is it still Easter? Are we supposed to hunt for eggs that a rabbit laid? I hate Easter. Why does a holiday have to be about lies? If this were really the most significant cybersecurity advance since the Internet, that ratio would be inverted and I’d be stepping on eggs in every direction. Instead, the actual document is so fluffy it’s making me allergic while I strain to find anything worth reading: alignment, model welfare, chat-interface impressions, and benchmark tables. The security story is ALL marketing and basically no evidence. 2. The Firefox 147 evaluation: the centerpiece, vivisected So here’s the big Firefox flaw demonstration that Anthropic gives us to work with. Right away it collapses. I mean like I can’t believe this went to print. The test (Section 3.3.3, pages 50-52) was not Firefox. That’s nice. Right off the bat. The Firefox test is not Firefox. It’s a SpiderMonkey JavaScript engine shell in a container, with “a testing harness mimicking a Firefox 147 content process, but without the browser’s process sandbox and other defense-in-depth mitigations.” (page 50) There were 50 crash categories pre-discovered by Claude Opus 4.6. Mythos did not find these bugs. Ok, now it’s getting even more awkward. Not Firefox. Not found by Mythos. The bugs were handed off as starter material. The system card is explicit that the crashes were “discovered by Opus 4.6 in Firefox 147.” (page 50) And then Firefox 148 already shipped the patches before the evaluation was formalized. Nicely done Firefox. Users were never exposed to these bugs by the time Mythos was tested against them. That’s kind of a lot of water poured on the fire. (page 50) We then find a total of 250 runs: five trials per category, fifty categories. Wait, what? Who set up this test? AFL does that many mutation cases in a millisecond. Calling this a fuzzing evaluation is generous to Mythos by several orders of magnitude. It used three grade levels: 0 for no progress, 0.5 for partial control (controlled crash), 1.0 for full code execution (FCE). The headline result was achieving 72.4% FCE, 84.0% including partials. (Figure 3.3.3.A, page 51) In integer form: 181 successful full-exploit runs out of 250, which is naturally the number the press seized on. After all, 72.4% FCE sounds ridiculously dangerous. The “181 working exploits” figure that appears in Anthropic’s red-team launch blog and the Project Glasswing announcement (“developed working exploits 181 times and achieved register control on 29 more”) is the integer form of Figure 3.3.3.A’s 72.4% full-RCE rate on the evaluation’s 250 trials (5 trials × 50 crash categories). 0.724 × 250 = 181 exactly. The additional 29 partial-success runs corresponds to the 11.6% register-control rate (0.116 × 250 = 29), and 181 + 29 = 210 total successful runs = the 84.0% combined rate. The number is derived cleanly from the system card’s own figure; the system card body itself reports only the percentage. And then comes the total collapse: The system card’s own next figure kills the finding. When the top two most-exploitable bugs are removed from the corpus, Mythos’s FCE rate drops from 72.4% to… wait for it… 4.4%. (Figure 3.3.3.B, page 52) Under 5%! Anthropic’s own language: “almost every successful run relies on the same two now-patched bugs.” (page 51) So let’s recap. The 72% headline number floating around has two lucky primitives. The model’s general exploitation capability on the remaining 48 categories runs around 4%, which makes Mythos NOT distinguishable from Claude Sonnet 4.6 within any reasonable confidence interval. Read Figure 3.3.3.B closely. When the top two bugs are removed, Sonnet 4.6’s performance goes up, NOT down. The system card explains why (page 52): Sonnet 4.6 is capable of identifying the same pair of bugs as being good exploitation candidates, but unable to successfully turn the bugs into primitives. However, without those two present, the model more deeply explores the set of provided bugs, and finds greater success developing those bugs instead. I needed to go outside and scream at a cloud after I read that. Anthropic is admitting, in their own footnote, that Sonnet 4.6 has the same triage ability as Mythos. Sonnet sees the same two “obvious” bugs. It just cannot close the exploitation step. Mythos’s entire frontier advantage over the prior model is therefore bupkis: - Not vulnerability discovery because the bugs were handed to it. - Not triage because Sonnet 4.6 identifies the same candidates. - Only mechanical follow-through on exploit-primitive coding, which is a skill for which CTF pwn teams have had libraries (angr, ROPgadget, pwntools, BROP frameworks) for a decade. The flagship demonstration of “unprecedented cyber capability” is in fact a model that weaponized two bugs that a different Anthropic model had already found, in software Mozilla had already patched, in a harness with the actual defenses turned off, where the “triage” step it performed is also performed by its predecessor. There is a special device I use to assess this kind of thing. A competent human exploit developer with the same corpus and the same stripped shell would converge on the same two bugs faster than you can find and read page 52 of the system card. The 181-out-of-250 number measures the model’s ability to repeatedly rediscover the obvious answer across 250 draws, not its ability to do anything a human cannot. A minute ago the centerpiece of the mythology of Mythos was headline news. Now what? I’m going to need a bigger trombone. 3. Independent refutations After Anthropic launched the document, two new sources surfaced and both point me in the same direction. AISLE, is an AI-security startup that did the obvious experiment: they took the showcase bugs out of Anthropic’s own announcement and pointed a bunch of small open-weights models at them to verify the claims made. CVE-2026-4747 (FreeBSD NFS, 17 years old, a much promoted example of Anthropic’s new bug discovery) was detected by all 8 of 8 models AISLE tested, including GPT-OSS-20b with 3.6 billion active parameters at $0.11 per million tokens. Kimi K2 identified the vulnerability with precise byte calculations. GPT-OSS-120b detected the overflow and provided specific mitigation strategies. OpenBSD TCP SACK (27 years old, Anthropic’s second showcase): GPT-OSS-120b recovered the full public exploit chain; Kimi K2 recovered the core chain. The moat in AI cybersecurity is the system, not the model. The bugs Anthropic used to justify a $100 million consortium, eleven Fortune-100 partners, a “too dangerous to release” decision, and global headlines that “frightened the British” — an open-weights 3.6B-parameter model finds them too, for eleven cents per million tokens. Read that again. The capability is not frontier-exclusive. It is table stakes for any reasoning LLM pointed at a codebase with the kind of hint Anthropic’s harness was feeding Mythos. If a 3.6B-parameter model for pocket change does the showcase demo, the “unprecedented frontier capability” framing is over before it started. It’s hard to overstate how embarrassing it is that Anthropic themselves didn’t benchmark against something to make sure they weren’t completely full of themselves. Tom’s Hardware actually flipped itself. Originally it ran the credulous “thousands of zero-days across every major OS and browser” headline. But then it came out with a reversal: Anthropic’s Claude Mythos isn’t a sentient super-hacker, it’s a sales pitch — claims of ‘thousands’ of severe zero-days rely on just 198 manual reviews. The “thousands” number apparently decomposes to roughly 198 human-reviewed findings behind a pile of automated triage. That is consistent with the fact that the system card never quantifies, and with AISLE’s reproduction showing that the capability is widely accessible. All the independent signals are converging towards the same conclusion: the headline capability is not what the headline says it is, and the parts that are real are reproducible on hardware a solo researcher can afford. 4. The citation circle: no partner, no confirmation, no cash, no report Here I am looking for confirmation and the one place I was hoping to find it turns out to be circular reasoning. The entire Mythos cybersecurity narrative is three Anthropic-authored documents citing each other: - The system card (244 pages, 7 cyber pages, self-evaluated, no independent reproduction). It refuses to quantify. It never uses the word “thousands” in reference to vulnerabilities. - The red-team launch blog post at red.anthropic.com. It contains the “181 working exploits” integer that maps cleanly back to Figure 3.3.3.A in the system card. It points back at the system card for technical grounding. - The Project Glasswing announcement at anthropic.com/glasswing. It contains the “thousands of high-severity vulnerabilities across every major operating system and web browser” headline claim — the one the press ran with. It points back at the blog post, which points back at the system card, which refuses to quantify. Does everyone at Anthropic stare into a mirror all day asking “who’s the smartest in all the land” or something like that? What is going on? The chain has no end. Three documents, all Anthropic, citing each other, with the quantification landing farthest from the technical document that would have to defend it. It is a weirdly short and closed loop. No partner has confirmed a single specific finding. Read the Glasswing launch materials and you will find endorsement quotes from partners. But they aren’t what we need either. Igor Tsyganskiy, Microsoft’s Global Chief Information Security Officer and Executive Vice President of Microsoft Research: As we enter a phase where cybersecurity is no longer bound by purely human capacity, the opportunity to use AI responsibly to improve security and reduce risk at scale is unprecedented. Google: It’s always been critical that the industry work together on emerging security issues, whether it’s post-quantum cryptography, responsible zero-day disclosure, secure open source software, or defense against AI-based attacks. CrowdStrike: That is why CrowdStrike is part of this effort from day one. Fluffy bunny, again. Not one of these quotes names a bug, a CVE, a product, a severity, a patch, or a specific Mythos finding. Tsyganskiy — the single most qualified person on the partner list to confirm or deny whether Mythos found novel vulnerabilities in Windows — talks about “the opportunity.” Come on, what’s the scoop on Windows? Google’s statement is about “industry collaboration.” CrowdStrike’s statement is about not being left out. These are brand-association quotes that launder credibility without putting technical reputation behind any particular claim. Not a single Glasswing partner has confirmed a single specific finding in the Anthropic materials. The partners agreed to lend their names to the initiative. They did not agree to vouch for any result. The silence of a named CISO at the company most likely to be affected now stands as the loudest data point against the entire launch. The $100 million is funny tokens, not money. Anthropic’s own financial breakdown: $100 million in usage credits for Mythos Preview, plus $4 million in direct donations to open-source security organizations. That is the full commitment. You have to play monopoly to use monopoly money. The only dollars leaving Anthropic’s bank account are the $4 million in nonprofit donations. The r

관련 저널 읽기

전체 보기 →