나는 코드 리뷰가 죽었다고 말했습니다. 내가 잘못한 점은 다음과 같습니다. – 맞습니다.
hackernews
|
|
🔬 연구
#2026년
#ai 개발
#review
#개발 문화
#소프트웨어 개발
#코드 리뷰
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
2026년경 코드 리뷰가 사라질 것이라는 저자의 주장은 많은 반향을 일으켰으며, 특히 코드 리뷰가 단순한 버그 수정보다는 팀의 지식 공유 수단이라는 반론에 직면했습니다. 하지만 AI가 SDLC(소프트웨어 개발 수명 주기)를 근본적으로 재편함에 따라, 판단과 검토는 코드 작성 이전의 기획 단계와 사양서(spec) 작성으로 이동해야 한다는 주장이 제기되었습니다. 실제로 기획 단계에서 사양서를 작성하고 AI가 코드를 생성하도록 한 실험에서, AI 에이전트가 6분 만에 65개의 기준을 검증하여 효율성을 입증했습니다. 결국 AI가 생성한 방대한 코드를 인간이 검토하는 방식은 비효율적이므로, 엔지니어는 사양서와 제약 조건을 정의하고 자동화된 도구가 이를 검증하도록 하는 새로운 프로세스로의 전환이 필요합니다.
본문
I recently published an article saying that 2026 will be the year code review dies. I even offered instructions on how to kill it. The article resonated with many people. Some were glad to see their everyday frustrations acknowledged, a new way of working identified, and were curious to see what a proposed solution might look like. Some said they were not happy with code review being replaced by something else, but admitted that the reality and the economics all point in that direction. And some were not convinced and voiced their counterarguments. Before I address them, let me just say two things: - Do I believe that AI can ever write code without bugs? No. - Do I believe that AI can ever write code better than most humans? Soon. “Code reviews are not only about catching bugs” This is the objection I take most seriously, because I used to agree. About a year ago I spoke to Adrienne Tacke, the author of Looks Good To Me: Constructive Code Reviews, about code review as a team’s knowledge-sharing and record-keeping function. Code reviews are the best place to capture the whys and the whats about changes in the codebase and spread that knowledge across the team, she argues, because you have to do them anyway. I found that argument compelling then. It feels like a different era now. The entire software development lifecycle as we know it is collapsing and being redefined. AI did not just make the SDLC faster; it’s completely reshaping it. David Poll recently wrote a piece that goes even further. His argument is that code review answers a fundamentally different question than “Does this code work?” It answers, Should this be part of my product? Tests tell you whether the code does what the author intended. Production observability tells you what the system is actually doing. Code review tells you whether the author’s intent was the right thing to build. But if the goal of review is to exercise judgment about whether a change belongs in the product, then reviewing a 500-line AI-generated diff is not actually the right mechanism. You’re not getting the judgment; you’re getting the diff. The judgment should have happened upstream, during the planning phase. By the time the code arrives, you are reading an artifact of a decision, not the decision itself. Where does the judgment go? I said the human checkpoint should move upstream: review plans, constraints, and acceptance criteria rather than 500-line diffs. Several commenters pushed back, saying that if the agent writes both the code and the tests, you’ve just moved the problem. One commenter noted that natural language specs are ambiguous and culturally loaded. But the spec is not supposed to be a replacement for a programming language. It doesn’t have to be more than a PRD or a JIRA ticket. And engineers don’t have to spend days dwelling over it; with guardrails in place, they can even use AI to help define scope and acceptance criteria. I see standardized specification as the new unit of knowledge for the project. Not as a prompt, but as a structured artifact, something that product owners and engineers co-author, check into version control, and hold as the thing they are accountable for. Automated checks verify not just that tests pass, but that the code conforms to the spec. “LLMs generate security-vulnerable code 30% of the time” Humans do too. AI code trained on bad human practices reproduces those practices at scale. The reviewer who once caught those issues is now reviewing ten times as much code. The ratio of bugs slipping through goes up even if the individual rate stays flat. That is a real risk, and the right answer to it is not “humans read more diffs.” It’s better guardrails: deterministic linters, contract verification, adversarial test agents, security scanning that doesn’t have opinions. The future here is not: catch bugs in review. It is to define what “no bugs” means in formal terms before the first line is generated, and let machines enforce it continuously. Here’s an experiment we ran recently to test how spec-driven verification would work: we implemented a full-stack feature with zero lines of manually written code. Instead of AI writing code and engineers reviewing it, the team spent two days writing and reviewing a detailed spec: scope, acceptance criteria, and edge cases, before any implementation started. Then we handed the approved spec to an AI agent and let it build. The result was about 6,000 lines of code. A second agent then verified the output against the 65 acceptance criteria items in the spec. It took six minutes. 60 passed, 4 failed, and 1 partial. A human doing the same verification would have taken hours. The spec review caught issues that would have been a full rework post-implementation. “AI agents can’t take accountability” This is the comment that got the most support, and it deserves an answer. Accountability is currently unresolved. Right now, if an agent ships something that breaks production, the human who prompted i
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유