멤그래프 수집기. AI 에이전트 속도 향상

hackernews | | 📦 오픈소스
#claude #gemini #오픈소스
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

Memgraph Ingester는 Java 코드베이스의 구조를 코드와 지식 그래프로 변환하여 Memgraph에 저장하는 도구입니다. 이를 통해 AI 에이전트가 텍스트 검색 대신 그래프 쿼리를 통해 프로젝트를 정확하고 효율적으로 분석할 수 있습니다. 여러 Java 프로젝트를 충돌 없이 하나의 인스턴스에서 관리하며, 패키지, 클래스, 메서드 등의 소스 구조와 엔지니어링 맥락을 모두 포함합니다.

본문

Ingests the structural model of a Java codebase into Memgraph as a queryable code + memory knowledge graph, combining source structure with persistent engineering context (decisions, rules, findings, etc.). Optionally paired with the Memgraph MCP server, this enables you AI agent to reason over both code and accumulated project knowledge via graph queries instead of raw text search — improving accuracy, reducing cost, and speeding up analysis. Having MCP configured is not required: mgconsole utility can be used to query the graph directly which also decreases tokens usage. You can use the code in this repo as-is, or fork it and customize it to your needs. Memgraph is free too. Please submit any issues or pull requests. Memgraph Ingester creates two project-scoped graphs for a Java codebase: - A Code graph under (:Project)-[:CONTAINS]->(:Code) - A Memory graph under (:Project)-[:HAS_MEMORY]->(:Memory) Every code and memory node is scoped by a project property, so multiple Java codebases can share the same Memgraph instance without collisions. The Code graph stores Java source structure in a queryable, persistent form. The ingester walks the source tree with JavaParser and symbol resolution, then writes packages, files, classes, interfaces, annotations, methods, fields, inheritance, and within-project call relationships. The parser is configured for Java 25 syntax. It should handle most sources written for earlier Java versions too, but JavaParser is not a javac replacement and may still miss unsupported or edge-case constructs. The Memory graph stores durable engineering context: decisions, ADRs, rules, findings, tasks, risks, questions, ideas, and domain notes. Memory items can refer to stable :CodeRef nodes, which are resolved back to the current code graph after ingestion. This lets agents query both structure (code) and knowledge (memory) without relying only on raw text search. See doc/MEMORY.md for the Memory usage guide with prompt examples and Cypher recipes. See SCHEMA.md for the full graph model. - Required: Java 25 JRE to run - Required: Memgraph instance (or Docker) - Optional: Java 25 SDK, Maven 3.9+ to build - Optional: mgconsole - Download the latest jar (v6.0.7 the latest for now) wget https://github.com/ousatov-ua/memgraph-ingester/releases/download/v6.0.7/memgraph-ingester.jar - Run Memgraph docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph-mage:3.9.0 - Ingest the project: Without classpath libs (weaker resolving): cd /path/to/your/java/project java -jar path/to/memgraph-ingester.jar \ --source path/to/src \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --wipe-project-memories \ --apply-schema With classpath libs (better resolving). Example for Maven projects: cd /path/to/your/java/project CP=$(mvn -q dependency:build-classpath -DincludeScope=test -Dmdep.outputFile=/dev/stdout 2>/dev/null) java -jar path/to/memgraph-ingester.jar \ --source path/to/src \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --wipe-project-memories \ --apply-schema \ --classpath "$CP" - Append knowledge for your agent # GitHub Copilot curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-github.sh \ | bash -s -- my-project # Claude curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-claude.sh \ | bash -s -- my-project # Codex curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-codex.sh \ | bash -s -- my-project # Gemini curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-gemini.sh \ | bash -s -- my-project - Enable MCP Memgraph for your AI agent (below you can find examples) OR put mgconsole in the path io.github.ousatov-ua memgraph-ingester - With Docker Compose: cd memgraph-platform docker-compose up -d - Just Docker docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph-mage:3.9.0 Bolt listens on localhost:7687 . git clone https://github.com/ousatov-ua/memgraph-ingester.git cd memgraph-ingester mvn clean package -Pshade -DskipTests Produces a shaded fat JAR at target/memgraph-ingester.jar . Or use published shaded fat JAR in releases page. cat src/main/resources/io/github/ousatov/tools/memgraph/cypher/create-schema.cypher | mgconsole --host localhost --port 7687 Creates uniqueness constraints and lookup indexes for both the code graph and the memory graph. Safe to re-run — existing constraints are reported and skipped. You can also use the CLI. This command will apply the schema to the memgraph database first, then ingest the project: java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --apply-schema Next command will also wipe all data in the memgraph database first, then will apply the schema and ingest the project: java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-all \ --apply-schema This will wipe the Code graph for this project first: java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code This will wipe the Code and Memory graph for this project first: java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --wipe-project-memories MATCH (p:Project)-[:CONTAINS]->(c:Code) RETURN p.name, c.sourceRoots, c.lastIngested; You should see your project with a fresh lastIngested timestamp. | Option | Short | Required | Default | Description | |---|---|---|---|---| --source | -s | yes | Root directory to scan (e.g. src/main/java ) | | --bolt | -b | yes | Bolt URL, e.g. bolt://localhost:7687 | | --project | -P | yes | Logical project name. Namespaces all nodes. | | --user | -u | no | Memgraph username (empty by default) | | --pass | -p | no | Memgraph password (empty by default) | | --threads | -t | no | 1 | Parser threads (default 1 ). Each thread gets its own Bolt session. | --wipe-project-code | no | no | false | Delete this project's code graph before ingesting | --wipe-project-memories | no | no | false | Delete this project's memory graph before ingesting | --apply-schema | no | no | false | Apply schema before ingesting | --wipe-all | no | no | false | Wipe all data (schema will be dropped first) | --incremental | no | no | false | Skip files whose last-modified timestamp matches the stored value | --watch | -w | no | false | Watch for changes in the source directory and automatically re-ingest | --classpath | no | no | Additional classpath entries (JARs) for symbol resolution, separated by the platform path separator. Improves CALLS edge and type resolution coverage. | --wipe-project-code only affects code nodes matching the given --project ; other codebases in the same Memgraph instance are untouched, and the :Project anchor remains. --wipe-project-memories only affects memory nodes matching the given --project ; the code graph and the :Project anchor remain. Large codebases ingest faster with multiple parser threads: java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --threads 8 Each thread holds its own JavaParser and its own Bolt session. The Driver itself is shared. Realistic speedup — don't expect linear scaling. JavaParser work is CPU-bound and parallelizes well, but Memgraph Community serializes writes internally, so the write path bottlenecks quickly: | Threads | Typical speedup | Bottleneck | |---|---|---| | 1 | 1× (baseline) | Sequential parse + write | | 4 | ~2.5–3× | Write serialization starts | | 8 | ~3–4× | Diminishing returns | | 16+ | ~3–4× | Writes fully saturated | 4–8 threads is the sweet spot on most machines. Values higher than your CPU core count rarely help. Determinism note: with --threads > 1 , file processing order is non-deterministic. MERGE is idempotent, so results are identical, but log order will vary between runs. For active development, use --watch (or -w ) to monitor the source directory for changes. The ingester will automatically re-ingest modified .java files, update call edges, and refresh code references whenever a change is detected: java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --watch Watch mode uses Java's WatchService for efficient OS-level notifications and includes a small debounce delay to handle multiple rapid writes (e.g., from IDE saves). It recursively watches all subdirectories under the --source root. This repo ships scripts designed to be dropped into any project that's been ingested. It tells AI agents how to scope queries to the right project, how the schema is shaped, when to reach for the graph vs. filesystem search, and how to use Memories for durable decisions and follow-up context. Use the bundled init-memgraph-claude.sh script, which fetches the template, substitutes the project name, and appends the result to the local CLAUDE.md Run it from inside the repo you just ingested: # Point at the script in your local checkout /path/to/memgraph-ingester/script/init-memgraph-claude.sh my-project Or fetch-and-run straight from GitHub: curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-claude.sh \ | bash -s -- my-project Commit the updated CLAUDE.md . Claude Code reads it on every session start. Use the bundled init-memgraph-codex.sh script, which fetches the template, substitutes the project name, and appends the result to the local AGENTS.md Run it from inside the repo you just ingested: # Point at the script in your local checkout /path/to/memgraph-ingester/script/init-memgraph-codex.sh my-project Or fetch-and-run straight from GitHub: curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-codex.sh \ | bash -s -- my-project Commit the updated AGENTS.md . Codex reads it on every session start. Use the bundled init-memgraph-gemini.sh script, which fetches the template, substitutes the project name, and appends the result to the local AGENTS.md Run it from inside the repo you just ingested: # Point at the script in your local checkout /path/to/memgraph-ingester/script/init-memgraph-gemini.sh my-project Or fetch-and-run straight from GitHub: curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-gemini.sh \ | bash -s -- my-project Commit the updated AGENTS.md . Gemini reads it on every session start. Use the bundled init-memgraph-github.sh script, which fetches the template, substitutes the project name, and appends the result to the local AGENTS.md Run it from inside the repo you just ingested: # Point at the script in your local checkout /path/to/memgraph-ingester/script/init-memgraph-github.sh my-project Or fetch-and-run straight from GitHub: curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-github.sh \ | bash -s -- my-project Commit the updated AGENTS.md . GitHub Copilot reads it on every session start. Claude Code needs the Memgraph MCP server to actually run queries. Minimal project-scoped config in .claude.json for the target project: { "mcpServers": { "memgraph": { "command": "uvx", "args": [ "mcp-memgraph" ], "env": { "MEMGRAPH_URL": "bolt://localhost:7687", "MCP_READ_ONLY": "false" } } } } Please set MCP_READ_ONLY to "false" if you want to have Memories captured Verify it's registered: claude mcp list Codex needs the Memgraph MCP server to actually run queries. Minimal project-scoped config in ~/.codex/config.toml : [mcp_servers.memgraph] command = "uv" args = [ "run", "--with", "mcp-memgraph", "--python", "3.13", "mcp-memgraph" ] [mcp_servers.memgraph.env] MCP_TRANSPORT = "stdio" MEMGRAPH_URL = "bolt://localhost:7687" MEMGRAPH_USER = "memgraph" MEMGRAPH_PASSWORD = "" MEMGRAPH_DATABASE = "memgraph" MCP_READ_ONLY = "false" [mcp_servers.memgraph.tools.run_query] approval_mode = "approve" The Codex example is read-only. To let an agent create or update Memory nodes, use a writable MCP connection, for example, by setting MCP_READ_ONLY = "false" and keeping run_query approval enabled. Verify it's registered: codex mcp list Codex needs the Memgraph MCP server to actually run queries. Minimal project-scoped config in ~/.gemini/settings.json : { "mcpServers": { "mcp-memgraph": { "command": "uvx", "args": [ "mcp-memgraph" ], "env": { "MEMGRAPH_URL": "bolt://localhost:7687", "MCP_READ_ONLY": "false" }, "timeout": 5000, "trust": true } } } The example is read-only. To let an agent create or update Memory nodes, use a writable MCP connection, for example, by setting MCP_READ_ONLY = "false" and keeping run_query approval enabled. Verify it's registered: gemini mcp list GitHub Copilot needs the Memgraph MCP server to actually run queries. Minimal project-scoped config in ~/.copilot/mcp-config.json : { "mcpServers": { "mcp-memgraph": { "type": "local", "command": "uvx", "args": [ "mcp-memgraph" ], "env": { "MEMGRAPH_URL": "bolt://localhost:7687", "MCP_READ_ONLY": "false" }, "tools": [ "*" ] } } } By default, the ingester resolves types against the JDK and the project source tree. To improve CALLS edge coverage, EXTENDS /IMPLEMENTS FQN resolution, and field/return type resolution for external library types, pass dependency JARs via --classpath : # Use Maven to collect the classpath (compile + test scopes) CP=$(mvn -q dependency:build-classpath -DincludeScope=test -Dmdep.outputFile=/dev/stdout 2>/dev/null) java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --classpath "$CP" Tip: Use -DincludeScope=test to include test-scoped dependencies (JUnit, Testcontainers, etc.) — this ensures parameter types from those libraries resolve to FQNs in method signatures and field types. Without the matching JARs, external types fall back to simple names (e.g.,Session instead oforg.neo4j.driver.Session ). This lets JavaParser resolve method calls whose parameters use types from Neo4j Driver, Spring, JUnit 5, picocli, etc. — dramatically increasing the number of CALLS edges in the graph. The graph goes stale as code changes. Re-run the ingester with --wipe-project-code to refresh: java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code For faster re-runs, use --incremental to skip files that haven't change

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →