실험 – Claude Code의 컨텍스트 스터핑을 의미론적 grep으로 대체
hackernews
|
|
📦 오픈소스
#ai
#ai 딜
#claude
#openai
#벡터임베딩
#의미검색
#코드베이스
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
오픈소스 도구인 'git-semantic'은 tree-sitter를 활용해 코드를 청크 단위로 파싱하고 벡터 임베딩을 생성하여, 이를 고아(orphan) Git 브랜치에 저장해 팀 전체가 공유할 수 있도록 지원합니다. 최초 1회 전체 인덱싱 후에는 변경된 파일만 재처리하는 증분 방식을 사용하며, CI/CD 파이프라인과 연동하면 별도의 API 키 없이도 누구나 자연어 기반의 시맨틱 검색을 수행할 수 있습니다. 특히 Claude Code, Cursor, GitHub Copilot 같은 코딩 에이전트와의 연동을 지원하여, 기존의 단순한 코드 검색을 대체하고 효율적인 컨텍스트 검색 환경을 제공합니다.
본문
Semantic search for your codebase. Parses every tracked file with tree-sitter, generates vector embeddings per chunk, and stores them on a dedicated orphan Git branch that mirrors your source tree — so the whole team can share embeddings without re-indexing. main branch semantic branch (orphan) ────────────────── ────────────────────────────── src/main.rs → src/main.rs ← [{start_line, end_line, text, embedding}, ...] src/db.rs → src/db.rs ← [{...}, ...] src/chunking/mod.rs → src/chunking/mod.rs git-semantic index parses all tracked files with tree-sitter, embeds each chunk, and commits the mirrored JSON files to thesemantic orphan branch. On subsequent runs it only re-embeds files that changed since the last index (incremental)git push origin semantic shares the embeddings with the team- Contributors run git fetch origin semantic +git-semantic hydrate to populate their local SQLite search index — no re-embedding needed git-semantic grep runs KNN vector similarity search against the local index Indexing only needs to happen once — whoever runs it pushes the semantic branch and the whole team benefits. Nobody else needs an API key or has to re-embed anything. You can run indexing manually from any machine, or automate it in your CI/CD pipeline so embeddings stay fresh after every merge. # Anyone with an API key runs this once (or after significant changes) git-semantic index git push origin semantic # Everyone else git fetch origin semantic git-semantic hydrate git-semantic grep "..." Add .github/workflows/semantic-index.yml to your repository and indexing happens automatically on every merge to main: name: Semantic Index on: push: branches: [main] jobs: index: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 token: ${{ secrets.GITHUB_TOKEN }} - name: Install git-semantic run: cargo install git-semantic - name: Index codebase env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: git-semantic index - name: Push semantic branch run: | git config user.name "github-actions[bot]" git config user.email "github-actions[bot]@users.noreply.github.com" git push origin semantic - Rust 1.65 or higher - Git 2.0 or higher cargo install git-semantic git clone https://github.com/ccherrad/git-semantic.git cd git-semantic cargo install --path . Parses and embeds files, then commits the result to the semantic orphan branch. git-semantic index - First run: full index of all tracked files, writes .indexed-at with the current HEAD SHA - Subsequent runs: incremental — diffs against the last indexed SHA, re-embeds only added, modified, renamed, or deleted files - Respects .gitignore (usesgit ls-files ) - Skips binary files - Files with unrecognized extensions are stored as a single chunk - Creates the semantic branch automatically on first run Reads the semantic branch and populates the local .git/semantic.db search index. git-semantic hydrate Attempts to fetch origin/semantic first, then falls back to the local branch. Search code semantically using natural language. git-semantic grep "authentication logic" git-semantic grep "error handling" -n 5 Injects code search instructions into CLAUDE.md so coding agents automatically use git-semantic grep instead of git grep . git-semantic agentic-setup - Appends instructions to an existing CLAUDE.md , or creates one if it doesn't exist - Idempotent — safe to run multiple times - Works with Claude Code, Cursor, GitHub Copilot (via .cursor/rules or.github/copilot-instructions.md equivalents) Configure the embedding provider. git-semantic config --list git-semantic config gitsem.provider openai git-semantic config gitsem.provider onnx git-semantic config --get gitsem.provider git-semantic config --unset gitsem.onnx.modelPath export OPENAI_API_KEY="sk-..." git-semantic config gitsem.provider openai git-semantic config gitsem.provider onnx git-semantic config gitsem.onnx.modelPath /path/to/model.onnx Rust, Python, JavaScript, TypeScript, Java, C, C++, Go git-semantic/ ├── src/ │ ├── main.rs # CLI and command handlers │ ├── models.rs # CodeChunk data structure │ ├── db.rs # SQLite + sqlite-vec search index │ ├── embed.rs # Embedding generation │ ├── semantic_branch.rs # Orphan branch read/write via git worktree │ ├── embeddings/ # OpenAI and ONNX provider implementations │ └── chunking/ # tree-sitter parsing and language detection ├── Cargo.toml └── README.md cargo build --release cargo test MIT OR Apache-2.0
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유