Context-pnpm – AI 컨텍스트 낭비로 TypeScript 모노레포 파일 점수 매기기

hackernews | | {'이벤트': '📰', '머신러닝/연구': '📰', '하드웨어/반도체': '📰', '취약점/보안': '📰', '기타 AI': '📰', 'AI 딜': '📰', 'AI 모델': '📰', 'AI 서비스': '📰', 'discount': '📰', 'news': '📰', 'review': '📰', 'tip': '📰'} 하드웨어/반도체
#openai #하드웨어/반도체

요약

AI 어시스턴트 작업 시 불필요한 코드 구현 내용까지 컨텍스트에 포함되어 토큰이 낭비되는 문제를 해결하기 위해 TypeScript 모노레포 구조를 최적화하는 방법이 제안되었습니다. 이를 위해 파일의 전체 토큰 수와 내보낸 API 토큰 수, 그리고 가져오기(import) 빈도를 계산하는 수식을 개발했으며, 점수가 60을 넘으면 별도의 패키지로 분리하는 것이 효율적이라는 기준을 제시했습니다. 또한 이러한 분석과 패키지 재구성을 자동으로 수행하는 CLI 도구를 개발하여 개발자가 모노레포를 더욱 효율적으로 관리할 수 있도록 지원합니다.

왜 중요한가

본문

Goal: Structure node packages so AI agents read less and understand more. Specifically: measure how TypeScript monorepo structure affects context window consumption, and build a tool that quantifies the waste and fixes it. Repository: markkovari/context-pnpm When I work on different parts of a codebase with AI assistants, the context window fills up fast. Every file the assistant reads to understand a dependency is loaded in full, including implementation details it will never touch. For a busy utility module, that's thousands of tokens of waste, on every session, across every file that imports it. I kept hitting conversation compacting earlier than expected, and it was slowing me down. My theory was that the shape of your modules, how many packages you have, how big they are, how nested, directly influences how many tokens get burned just loading context. But I didn't have numbers. I didn't know the threshold where splitting a module actually pays off versus adding maintenance overhead for no gain. So I built a tool to find out. I wanted to answer a simple question: given a TypeScript codebase, which files are costing you the most tokens per AI session, and is it worth restructuring them? The core insight is that file size alone doesn't predict waste. What matters is how much of a file is implementation versus exported API, multiplied by how many files import it. A 10,000-token type declaration file with 98% exports barely registers. A 700-token utility module with a large implementation body, imported by 18 files, costs more than almost anything else. I landed on this scoring formula: score = (total_tokens − surface_tokens) × importer_count | Term | Definition | |---|---| total_tokens | Full file token count (tiktoken cl100k_base) | surface_tokens | Only the exported declarations | importer_count | Number of files that import this one | 💡 If the score is above 60 (the overhead of a package.json +index.ts boilerplate), extraction into a separate workspace package is worth it. Below that, leave it alone. External packages | Package | Purpose | |---|---| | tiktoken (OpenAI) | Accurate token counting with cl100k_base encoding | | typescript-estree (typescript-eslint) | ESTree-compatible AST parser to distinguish exported surface from implementation body | Internal packages | Package | Role | |---|---| analyzer | Reads folders via glob pattern, returns total tokens, surface tokens, and importer counts | estimator | Projects token savings per AI session from analyzer output | cli | User-facing tool: analyze , estimate , scaffold , verify , rebalance . Dry-run by default; nothing written without --apply | scaffolder | Rewires imports/exports, registers new pnpm workspace packages, generates minimal index.ts re-export surfaces |