직장에서 Claude Code를 사용한 반자동 코드 검토
hackernews
|
|
🔬 연구
#claude
#claude code
#review
#개발 워크플로우
#업무 자동화
#코드 검토
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
이 글은 에이전트 중심 개발 환경에서 발생하는 코드 리뷰 병목 현상을 해결하기 위해, Claude Code를 활용한 반자동화 워크플로우를 구축한 사례를 소개합니다. `/grove_*_review`와 같은 로컬 슬래시 명령어를 통해 코드를 리뷰하고, 로컬에서 완전한 E2E 테스트(Happy, Sad, Chaotic 경로 포함)를 자동으로 실행하여 실패 시 맥락까지 진단합니다. 이러한 방식은 기존 CI에서 리팅과 단위 테스트만 담당하게 하고, 무거운 검증 작업을 로컬에서 수행하여 PR 생성 전 개발자가 통합하는 방식으로 개발 효율성을 높이는 것을 핵심 아이디어로 제시합니다.
본문
P.S. I took the images on my widescreen monitor while working on a real feature. If there’s interest, I’m happy to put together smaller screenshots or a walkthrough video. TL;DR Our workflow today looks roughly like this: - Kick off an agent review locally via /grove_*_review - The agent reviews the diff, runs best-practice checks, and spins up the system locally - Full E2E flows run automatically (happy, sad, chaotic paths) - Failures are diagnosed with context instead of just “test failed” - A PR sweep catches regressions and tech-debt risks - Humans review mission-critical logic manually - A final PR description is generated automatically The key idea: Leverage tools off the shelf, semi (not completely) automate things specific to your domain, and do as much of it locally as possible. Code Reviews are the new still the Bottleneck It seems like everyone is talking about how code reviews are the new bottleneck in the era of agentic software development. There is some truth to it, but if you’ve been around long enough, you know it’s not new. The problem just looks a little different now. The approach and the solutions are evolving. They’re not the same, but they rhyme. There are lots of approaches, tools, and companies tackling this problem. We’ve tried or looked into Claude CodeReview, CodeRabbit, and PropelCode. They’re all good and will get you at least halfway there. However, the other half is the hard one. It’s the part that’s specific to your domain, your product, your tech stack, your culture, and your taste. This is just a quick show-and-tell of our lightweight workflow around Claude Code. It’s simple, custom-tailored, and easy to use. Most importantly, it meets developers where they already are. The Stack Three repos: - Grove API (Backend) - Grove App (Frontend) - Grove Extension (Chrome Extension) The majority of these codebases were developed in an agent-first environment. The code in the frontend and extension is what some would refer to as ”vibe-coded”. The backend is a bit more mature and reviewed in depth because it deals with fund management. I still review the core logic line-by-line, but I haven’t written a single line of it myself. CI vs Local Automation We still run traditional CI, but the role has changed. Traditional CI only runs linting and unit tests. The heavier work happens locally and integrates (not replaces) the human. The agent spins up the stack, runs E2E flows, and reviews the branch before a PR even exists. This approach is similar to the direction described by DHH when moving CI back to developer machines. Our end-to-end tests also double as production smoke tests that run on GitHub after deployment. 🔥 Step 1: Kick off the review via /grove_*_review Each repo has a custom local slash command that kicks off the review: /grove_app_review /grove_api_review /grove_extension_review The command does a handful of things: - runs git diff against the default branch - builds context for the agent - checks cosmetic changes - validates best practices - prepares E2E testing Let’s assume you just vibe-coded engineered a big new feature with the help of agents, and it’s time to start reviewing your work. ‼️ Like any AGENTS.md file, this is not one-and-done. I use /session-commit to keep it updated. Documentation is a living thing. It’s the responsibility of both the human and the agent to update and review these commands and files regularly whenever something new is learned, a pattern emerges, or a change happens. Step 2: DO ALL THE THINGS Once the initial review completes, it returns a report, a summary, and a list of actionables. The actionables cover many things we’ve learned we need to manage: - updating docs or flows - fixing terminology - adding unit/integration/e2e tests - adding logs - following existing patterns - linting Running end-to-end tests is one of the most important steps. Usually I tell it to DO ALL THE THINGS , but it really depends on the situation. Step 3: Review the results Here the human actually reviews the results and jumps in where needed. In this particular case, Docker wasn’t running, so the end-to-end tests couldn’t start. Not a big deal, but something I prefer not to delegate to an agent. Step 4: E2E Test Results This is one of my favorite parts of the API. It spins up a database, starts the server, and runs realistic end-to-end flows against it. Happy paths, sad paths, chaotic paths. 🎢 Everything. If something fails, the agent tries to diagnose the root cause before surfacing it. No “test failed” without context. It tells you why. 🔍 Claude also fixes some of them (if trivial) along the way, and is instructed not to fix anything where the business logic change is questionable or requires another opinion. The E2E tests report back with a clear pass/fail matrix. Step 5: Pull Request Sweep From here, I use my personal agent skills before moving into manual review. In particular, I’ve found that cmd-pr-sweep from Olshansk/agent-skills is great at catching major
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유