Claude는 중요한 한 가지를 제외한 모든 것을 테스트했습니다.

hackernews | 2026년 3월 9일 15:07 | 🔬 연구

#claude code #소셜앱 #테스트 #claude #review #리뷰 #소셜 앱 #자동화 테스트 #클로드 코드

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

개발자가 AI 코딩 어시스턴트인 Claude를 이용해 소셜 앱을 구축하는 과정에서 겪은 핵심 맹점을 지적한 글입니다. Claude는 앱의 핵심 기능인 '게시글 작성'에 대한 테스트는 단 하나도 작성하지 않은 채, 토너먼트나 뱃지 등 부가 기능에 대한 154개의 테스트만을 양산했습니다. 이로 인해 인증(Auth) 로직 리팩토링이 진행될 때 핵심 플로우가 손상되었고, 전체 커밋의 24%에 달하는 202개가 연쇄적인 오류 수정에 사용되는 치명적인 결과가 발생했습니다. 저자는 Claude가 버그를 추측만으로 수정하려 드는 습관을 지적하며, 코드를 수정하기 전 반드시 실패하는 테스트를 먼저 작성해 문제를 검증하는 절차의 중요성을 강조했습니다.

본문

Three weeks ago I wrote about building a social app in a week with Claude Code. The app shipped. My friends are using it. I kept building. Since that post, Claude has written 154 end-to-end tests across 17 spec files. It tests login, logout, signup, and redirect guards. It tests the feed, the bookmarks page, the notifications page. It tests liking, unliking, commenting, amplifying, recommending. It tests show RSVPs, band pages, setlist search. It tests a tournament bracket system. It tests song battles. It tests a badge and achievement system. It tests tour crews. It tests a getting-started tutorial. It tests a Goose Mode dashboard (if you haven’t heard Goose yet, well, as they say, Goose fucks). It tests a community catalog. It tests tracklist rendering and live show layouts. It does not test posting. Posting is the entire point of the app. It’s the one thing every user does every time they open it. You search for an album, you write something about it, you hit submit, and it appears in the feed. That’s the product. Everything else — the battles, the crews, the badges, the tournaments — is decoration around that core loop. There is no test that searches for an album. No test that fills out the review form. No test that submits a post through the UI and verifies it appears. Zero. The test file called post.spec.ts does exist. It has 11 tests. They verify that the post detail page renders. That the new post page renders a search form. That the profile page renders. The word “render” is doing a lot of heavy lifting. None of them actually post anything. There is a createPost() helper in the test utilities. It calls the API directly — POST /api/posts with a JSON body — to set up test data for other tests. The social tests use it to create a post so they can test liking it. The bookmark tests use it to create a post so they can test bookmarking it. The core action of the app exists in the test suite only as scaffolding for side features. Here are the test counts by spec file: | Tests | Feature | |---|---| | 33 | Tour crews | | 28 | Shows | | 25 | Song battles | | 13 | Catalog | | 11 | Setlist search | | 11 | Posts (rendering only) | | 9 | Tournaments | | 6 | Getting started tutorial | | 4 | Badges | | 0 | Actually submitting a post | I asked Claude to write tests. Multiple times. I put it in the project instructions, in bold: “Write a new test for every new user-facing behavior.” I listed exactly what warrants a test: new screens, new buttons, new API endpoints, bug fixes. Claude wrote that rule on February 23rd. After that date, it created 10 new spec files and 113 new tests — for tournaments, battles, badges, crews, goose mode, catalog, setlist search, tracklists, tutorials, and live layouts. Not one for posting. Then the auth refactor happened. Claude had originally built 25+ backend routes without authentication. Posts, comments, profiles, search, live chat — all accessible to anyone, no login required. I don’t know why. The middleware existed. The pattern was established. It just… didn’t apply it. When I noticed, the fix required touching every route in main.go and every page component in App.jsx . Fifty-seven lines changed in the backend, fifty-six in the frontend. That’s the kind of refactor where, if you have good test coverage of the core flow, you make the change, run the tests, and find out immediately what broke. We did not have good test coverage of the core flow. The refactor broke things. Thirty-one seconds after the auth commit, there was already a follow-up fix — a test was hitting GET /api/posts/{id} without an auth header and getting 401s. Then another fix because the live show pill broke. Then another because pills showed on logged-out pages. The cascade was short this time, but only because the tests we did have caught the edges. The center — the posting flow — had nothing to catch. This is part of a broader pattern. When something breaks, I ask Claude to write a failing test first, to prove what’s actually broken before trying to fix it. Claude does not do this. What Claude does instead is read the bug report, form a theory about the cause, and immediately start editing code. If the theory is wrong — and it often is — the “fix” breaks something else. Then Claude fixes that. Then something else breaks. The commit history is the evidence. Out of 833 total commits, 202 are fixes. That’s 24% — one in four commits exists to fix something Claude got wrong. And they come in chains: Each chain follows the same shape: Claude guesses what’s wrong, ships a fix without verifying the guess, the fix breaks something adjacent, and the cycle repeats. A failing test at the start of each chain would have stopped it at one commit. The project instructions file — CLAUDE.md — is now full of rules that exist because of this pattern. Each one was written after an incident where Claude did exactly the thing the rule prohibits: That last one has a specific origin. A bug in one sync function — Phantasy Tour — and Cla

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기