Show HN: Erdos 문제와 LLM에 대한 또 다른 실험

hackernews | | 🤖 AI 모델
#ai 모델 #anthropic #chatgpt #claude #gemini
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

배경: 저는 수학자가 아닌 코더이지만 이 이야기는 꽤 흥미로웠습니다.<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=47903126">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=47903126</a></p><p>무작위로 공개된 문제를 선택하여 던지면 얼마나 멀리 갈 수 있을지 궁금했습니다. LLM.</p><p>공개: 출력이 원격으로 올바른지 여부는 말할 것도 없고 문제가 무엇을 의미하는지 전혀 모릅니다. 내 관심은 순전히 기능 테스트에 있습니다.

본문

Background: I am a coder, not a mathematician, but I was quite entertained by this story:<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=47903126">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=47903126</a></p><p>I wondered how far I could get by just choosing a random open problem and throwing it at LLMs.</p><p>Disclosure: I have no idea what the problem even refers to, let alone whether or not the output is even remotely correct. My interest is purely for testing capabilities of various models, curiousity, and entertainment.</p><p>Problem: <a href="https:&#x2F;&#x2F;www.erdosproblems.com&#x2F;691" rel="nofollow">https:&#x2F;&#x2F;www.erdosproblems.com&#x2F;691</a></p><p>
<code> Given A\subseteq \mathbb{N} let M_A=\{ n \geq 1 : a\mid n\textrm{ for some }a\in A\} be the set of multiples of A.
Find a necessary and sufficient condition on A for M_A to have density 1.
</code>
My approach:
I used DeepSeek in Expert mode, using the same prompt as in the linked HN submission. It thought for a very long time, but I was doing other things in the background so I didn&#x27;t really time it. I pressed &quot;Continue&quot; twice over the space of maybe 60mins. The output says it thought for about 46mins.</p><p>Once it generated a proof, I asked Opus 4.7 to review it, and then entered the review into DeepSeek which made edits, corrections and refinements. This back-and-forth continued till Opus 4.7 was reasonably happy. At that point, I called in Gemini 3.1 Pro Preview, which raised issues which Opus missed. Opus acknowledged the feedback, and then I placed its feedback into Deepseek for a final round. Essentially, what Opus says Deepseek generated was a &quot;clean exposition of a D[avenport]-E[rdos] corollary&quot;, not a new result. In all likelihood this result may already be known (Deepseek was not allowed to use the internet for this phase), or even wrong.</p><p>In &quot;simple&quot; terms:</p><p>
<code> The argument actually proves a stronger fact for every set \( A \) of natural numbers:
The upper density of the set \( M_A \) equals the largest possible lower density you can get from finite subsets of \( A \), and that also equals the lower density of \( M_A \).
When the upper density is 1, it forces the lower density to also be 1, so the natural (ordinary) density exists and equals 1 automatically, without needing any extra conditions.
The only non-basic part of the proof is the Davenport–Erdős theorem; everything else is simple.
</code>
In any case, these were my takeaways:</p><p>- These new models seem to be surprisingly capable especially when used to in conjunction with each other,
even with fairly simple prompts</p><p>- I am quite impressed by Deepseek. I&#x27;m going to review its coding ability, and may even switch completely from Anthropic</p><p>- This was a genuinely interesting exercise, even if I have no idea if any of it is correct or useful</p><p>Some other observations:</p><p>- Opus was really fast at reviewing Deepseek&#x27;s output. Literally seconds</p><p>- Gemini had trouble figuring out what &quot;Erdos 691&quot; referred to</p><p>- The free version of ChatGPT of generated mostly useless output. I didn&#x27;t include it.</p><p>Chat links below:</p><p><a href="https:&#x2F;&#x2F;chat.deepseek.com&#x2F;share&#x2F;hpguvrhcxn226bi3hn" rel="nofollow">https:&#x2F;&#x2F;chat.deepseek.com&#x2F;share&#x2F;hpguvrhcxn226bi3hn</a></p><p><a href="https:&#x2F;&#x2F;claude.ai&#x2F;share&#x2F;4f3ccad1-d862-4e37-8333-8a1ebd84b38f" rel="nofollow">https:&#x2F;&#x2F;claude.ai&#x2F;share&#x2F;4f3ccad1-d862-4e37-8333-8a1ebd84b38f</a></p><p><a href="https:&#x2F;&#x2F;aistudio.google.com&#x2F;app&#x2F;prompts?state=%7B%22ids%22:%5B%221cRWKS3ngW_nqfSn3W-Kq_bWlk8FWXmDw%22%5D,%22action%22:%22open%22,%22userId%22:%22100878499144503719961%22,%22resourceKeys%22:%7B%7D%7D&amp;usp=sharing" rel="nofollow">https:&#x2F;&#x2F;aistudio.google.com&#x2F;app&#x2F;prompts?state=%7B%22ids%22:%...</a></p>

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →