코드베이스 준비도 평가표: 귀하의 저장소가 AI 에이전트를 처리할 수 있습니까?
hackernews
|
|
📦 오픈소스
#ai 모델
#claude
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
이 기술은 AI 네이티브 개발을 위한 9가지 차원의 진단 도구인 코드베이스 준비도 그리드를 실행하여, 평균이 아닌 최하위 점수를 기준으로 준비 등급을 산정합니다. 테스트 커버리지, 타입 엄격성, 모듈 경계 명확성 등을 평가하여 점수카드와 우선순위가 정해진 개선 계획을 제공합니다. 이는 저장소가 AI 에이전트를 효과적으로 다룰 수 있는지 판단하는 데 도움을 줍니다.
본문
A Claude Code skill that runs the Codebase Readiness Grid — a nine-dimension diagnostic for AI-native development. By Kenogami. Part of the AI-Native Transformation Framework. Given a codebase, the skill produces a scorecard (1–5 per dimension), a readiness level (set by the lowest score, not the average), and a prioritized remediation plan. | # | Dimension | Question it answers | |---|---|---| | 1 | Test coverage and feedback latency (blocking) | Can the agent get fast, useful signal on whether changes break things? | | 2 | Type strictness (blocking) | Can the agent reason about inputs and outputs without reading everything? | | 3 | File size and context legibility | Can the agent understand one file independently? | | 4 | Module boundary clarity | Can the agent modify one thing without breaking another? | | 5 | API directness (blocking) | Are API calls visible at the call site, or hidden behind abstractions? | | 6 | Documented intent | Can the agent distinguish intentional behavior from historical accident? | | 7 | Observability | Can failures be localized and reproduced in production? | | 8 | Dev and deploy simplicity | Can anyone go from fresh clone to shipped feature without fighting infrastructure? | | 9 | Dependency and runtime currency | Does the stack match patterns current AI models are trained on? | Dimensions 1, 2, and 5 are blocking — low scores there compromise agent work fundamentally and can't be compensated by high scores elsewhere. The other six are constraining — low scores degrade quality but don't block agent work outright. The skill also classifies the codebase state (greenfield / brownfield / hybrid) and applies a deferral credit: intentional, documented deferrals score one level higher than undocumented gaps. Scorecards are never summarized with an average — the ceiling (lowest score) sets the readiness level, and blocking dimensions take priority regardless of overall distribution. The framework page explains each dimension, the scoring rubric, and why the dimensions together predict whether AI agents can produce reliable work on a codebase. AI coding agents amplify whatever structure already exists in a codebase. In brownfield codebases, the agent's output quality is constrained by the infrastructure surrounding it, not the model's capability. This is the DORA 2025 "amplifier" finding and Fowler's Harness Engineering conclusion: Agent = Model + Harness. Readiness tools exist for individual dimensions — coverage reporters, type checkers, dependency graphers, telemetry stacks. None integrate them into a single AI-native readiness assessment. This skill does. git clone https://github.com/Kenogami-AI/codebase-readiness ~/dev/codebase-readiness mkdir -p ~/.claude/skills ln -s ~/dev/codebase-readiness/.claude/skills/assess-codebase-readiness.md ~/.claude/skills/ The symlink makes /assess-codebase-readiness available in any Claude Code session. Alternative: drop the skill directly into any project's .claude/skills/ directory to scope it to that project. From inside the codebase you want to assess: /assess-codebase-readiness The skill will: - Detect the repo's language(s) and framework(s). - Collect signals per dimension (coverage reports, type configs, file size distribution, dependency graph, telemetry libraries, etc.). - Read representative files where qualitative judgment is needed (especially dimensions 5 and 6). - Score each dimension 1–5 conservatively with cited evidence. - Output a structured report with a readiness level, scorecard, evidence per dimension, and prioritized remediation plan. - Recommend a brownfield strategy (remediate in place / strangler-fig / rebuild / isolate and bypass) based on the scorecard. # Codebase readiness assessment Repo: acme-platform Primary language: TypeScript Framework(s): Next.js 15, React 19 Source files (approx): 287 Date: 2026-04-19 ## Codebase state: Brownfield Two-year-old production codebase, 40+ contributors, significant legacy accumulation. ## Readiness level: Level 1 — Instrumented Ceiling set by dimension 6 (Documented intent), scoring 1. ### Blocking dimensions at 1–2 - D5 (API directness) at 2 — opaque Factory abstractions produce confidently wrong agent code. ## Scorecard | # | Dimension | Score | Evidence | |:-:|-----------------------------------------|:-----:|------------------------------------------------------------| | 1 | Test coverage and feedback latency | 3 | Coverage: 64%. CI p50: 8 min. | | 2 | Type strictness | 4 | tsconfig strict: on. `any` count: 12 (0.3%). | | 3 | File size and context legibility | 4 | p50: 142 lines. Largest: 847 (src/admin/dashboard.tsx). | | 4 | Module boundary clarity | 3 | 12 top-level modules; 37 boundary violations detected. | | 5 | API directness | 2 | Factory abstraction in 89% of API call sites. | | 6 | Documented intent | 1 | No ADRs. 4 stale READMEs. No CLAUDE.md. | | 7 | Observability | 3 | Structured logs, Sentry wired, no tracing. | | 8 | Dev and deploy simplicity | 4 | Dev setup: 2 cmds. Deploy: auto on merge. | | 9 | Dependency and runtime currency | 4 | Runtime current (Node 20). React 19. No abandoned libs. | ## Prioritized remediation 1. Refactor Factory abstraction in 3 high-traffic modules to direct API calls (blocking — D5). 2. Document intent for top 5 critical modules — establish ADR process, add CLAUDE.md (ceiling — D6). 3. Enforce module boundaries via lint rules; resolve the 37 existing violations. ## Recommendation: Remediate in place The architecture is fundamentally sound. The blocker is D5 (API directness) — opaque Factory abstractions make call sites lie. Fix that first, then work up through the ceiling. Why Claude Code skill (not CLI, not CI tool). This is an AI-assisted assessment. Dimensions like "documented intent" can't be scored by static analysis alone — they require an LLM reading modules and judging whether intent is legible. Making this a skill puts the assessment where the agent already is, rather than fighting to reinvent the runtime. Why the lowest score sets the ceiling. Agents fail at the weakest link. A codebase with six 5s and one 1 cannot support Rung 5 working mode — the one gap is where the agent produces confident wrong output. Averaging hides this. Why signal-then-judge, not pure static. Static-only tools are reproducible but miss the hardest dimensions. LLM-only tools are deep but non-deterministic. The skill's structure — static signals as evidence, LLM interpretation for scoring — balances both. - v1 (this release): scorecard + remediation plan + mode recommendation. - v2: /remediate-readiness — takes a dimension, generates concrete PRs (add ADR for module X, refactor Factory at file:line, etc.). - v3: --save flag to track scores over time in.claude/readiness-history.json ; trend graphs. - v4: headless CI mode; JSON output for dashboards. MIT. See LICENSE. Created and maintained by Kenogami. The Codebase Readiness Grid is part of the AI-Native Transformation Framework — a broader framework for transforming engineering, operations, and roles around AI-native development. The readiness model synthesizes research from:
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유