AI 유틸리티: 명확성을 위한 탐구
hackernews
|
|
🔬 연구
#ai 유틸리티
#review
#리뷰
#명확성
#소프트웨어 엔지니어링
#에이전트 코딩
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
소프트웨어 엔지니어링에서 LLM을 활용할 때, 검증이 어려운 보안이나 성능과 같은 ‘보편적 명제’ 문제에는 LLM의 부정확성 때문에 대체가 불가능합니다. 반면, 해답이 존재하는지 찾는 ‘실존적 명제’ 문제는 효율적인 검증 방법이 있다면 LLM에 위임하여 검색 비용을 절감할 수 있습니다. 그러나 동일한 도구가 공격자에게도 확장 가능한 악용 수단이 되므로, 엔지니어링 업무에는 인간의 더욱 신중한 검토와 엄격함이 필요합니다.
본문
Ever since agentic coding became all the rage in online and offline software engineering circles, I have felt what may be termed "vibe coder's impostor syndrome"—a worry that I am holding it wrong. The following is an attempt to overcome this impasse through a characterization of a class of problems that may be reliably delegated to LLM agents. Intelligence & rigor We naturally feel that a person's intelligence and rigor generally go hand in hand—this is not the case for LLMs; they are irredeemably untrustworthy oracles. By rigor I mean the discipline required to steer clear of security vulnerabilities, performance degradation, broken features, and so on. Intelligence on the other hand is a measure of knowledge and analytical skill. If an engineer's labor requires more rigor than what is achievable by LLMs—which is not a difficult threshold to cross—then they are not replaceable. Access to LLMs' immense knowledge and poor—but highly scalable—analytical skill may make one more efficient and more rigorous. But how does one leverage LLMs effectively in a rigorous domain? Existential statements vs. universal statements Efficient methods of solution verification make LLMs (cost-)effective. Needless to say, if verifying LLM output requires more effort than producing it manually, then burning tokens is not worthwhile. Universal statements in logic assert properties about arbitrary sets of objects, e.g. "the square of every real number is non-negative". To prove such a statement, one needs to carefully construct a sound step-by-step proof—the more complex the logic of the proof, the harder it is to verify it. Errors in some mathematical proofs have only been spotted decades after the fact. For a software engineer, a correct program must compute the desired data or side effects given any possible input—this is a universal statement as well. Conversely, Existential statements hand assert the existence of at least one object that satisfies a certain property, e.g. "There exists a valid solution to this 9×9 Sudoku puzzle". If we could somehow obtain a candidate solution from a dubious oracle, then verifying said solution by substitution would be more efficient than solving the Sudoku puzzle by hand. We don't stand to waste much effort if the solution turns out to be incorrect. If a problem may be framed in terms of an existential statement while also supporting an efficient verification method, then it may be reliably delegated to an LLM—not for the goal of utilizing AI in and of itself, but because we stand to spare ourselves the substantial effort of searching the solution space. The following table illustrates this perspective with three concrete use cases. | Use case | Example | Verification method | |---|---|---| | Generate safe-to-fail code | Sandboxed throwaway shell scripts | Execute it | | Root cause analysis | Searching darwin-xnu for the source of an EADDRNORAVAIL | Test the hypothesis | | Discovery of API elements | Finding out how to achieve a style in CSS | Cross-check with API documentation | Adversarial advantage Another important class of existential statements that may be effectively tackled with LLMs falls within security research, e.g. "is there an exploitable vulnerability in this codepath?". One can easily picture agentic systems that continuously scan open-source libraries and network services for exploitable bugs. The idea of agentic coding—i.e. the process of building systems with large swathes of LLM-generated code and human Cursor-y verification—is doomed from the start within rigorous domains, not only because of LLMs' inherent lack of rigor: agentic coding is sabotaged by the very existence of agents—the same tools provide adversaries with a scalable means of exploitation. This asymmetric advantage applies retroactively too: pre-existing low-quality systems are also under increased risk of exploitation. I hope for greater care in software—certainly not less and less.
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유