Claude Code doesn't trust Claude with permissions

hackernews | 2026년 4월 11일 18:43 | 🔬 연구

#anthropic #claude #claude code #llm #permissions #review #security

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

클로드 코드는 도구 선택이나 코드 생성 등 대부분 기능에 LLM을 활용하지만, 권한 허용 시스템은 규칙 매칭과 정규 표현식 같은 결정론적 코드로만 구성됩니다. 특히 bash 도구는 23개의 독립적인 보안 검증기를 통해 명령어를 면밀히 분석하며, .git/ 등 민감한 경로를 변경하는 시도는 사용자 승인을 강제하는 우회 불가 규칙이 적용됩니다. LLM은 사용자가 없는 '자동 모드'에서만 최후의 수단으로 참여하며, 오류 발생 시 항상 기본적으로 사용자에게 묻도록 설계되어 모델의 판단보다는 코드 기반 보안을 우선함을 보여줍니다.

본문

The Claude Code source leak showed that most of the system runs on LLM calls: tool selection, code generation, memory extraction, context management. But there’s one subsystem where the LLM is almost entirely absent: permissions. The system that decides whether a tool call should be allowed, denied, or shown to the user for approval is deterministic code. Rule matching, glob patterns, regex validators, hardcoded path checks. When it actually matters, they didn’t trust the model until recently. The decision pipeline Every tool call runs through hasPermissionsToUseToolInner before execution. The logic is a priority chain: - Check tool-level deny/ask rules (glob pattern matching against a settings hierarchy) - Run the tool’s own checkPermissions() method (per-tool code, not LLM) - Check for bypass-immune safety conditions (sensitive paths, content-specific rules) - If bypassPermissions mode is active and nothing above fired, allow - Check tool-level allow rules - Default: ask the user This is all just code. No model inference, no classification, no probability distribution. A tool call either matches a rule or it doesn’t. The bash tool alone has a 6-stage pipeline for its checkPermissions() step: compound command splitting, safe wrapper stripping, rule matching per subcommand, 23 independent security validators, path constraint checks, and sed/mode validation. The security validators are worth a closer look. The system pre-computes four different views of each command: - Raw / unchanged: bash -c "rm '$target'" - Double-quotes stripped: bash -c rm '$target' - Fully unquoted: bash -c rm $target - Quote-chars preserved: bash -c " ' '" Thus each validator picks the right representation without re-parsing. Validators cover command substitution patterns, Zsh-specific dangerous builtins, IFS injection, brace expansion, unicode whitespace tricks, and more. The permission system has more in common with a traditional RBAC layer than with an LLM prompt. Bypass-immune checks Some checks can’t be bypassed regardless of the permission mode, writes to .git/ , .claude/ , .vscode/ , and shell config files always prompt the user. This is hardcoded. Same goes for tools that require user interaction and content-specific ask rules. These fire before the bypass check in the pipeline, so there’s no mode, flag, or setting that can skip them. The order of operations is the guarantee. The bypass literally cannot run before the immune checks have had their say. It’s enforced by control flow, not by policy. The one LLM path: auto mode There is exactly one place where an LLM participates in permission decisions: auto mode. The feature is gated behind the TRANSCRIPT_CLASSIFIER feature flag. Anthropic has shipped auto mode publicly since the release, with the explicit caveat that it “reduces risk but doesn’t eliminate it.” The deterministic pipeline still runs first. The classifier only runs as a fallback. If the code-based pipeline can resolve the permission (allow or deny), the classifier is never called. It only activates when the code reaches “ask” and the system is in auto mode instead of prompting a human. The failure mode is always “ask a person.” A classifier API error results in deny. Three consecutive denials trigger a fall back to human prompting. Twenty total denials do the same and reset the counter. If the transcript is too long for the classifier’s context window, it falls back to human prompting. The system never defaults to allow on error. The design philosophy The split is stark. Claude Code uses an LLM for nearly everything: deciding which tools to call, generating code, extracting memories, managing context. But when the question changes from “what should the model do” to “should the model be allowed to do it,” the LLM is barely in the loop, and only as a fallback. Anthropic built a product around a model and then built the permission system specifically to not depend on that model. The only exception is fail-closed, heavily gated, and was kept internal until they were confident enough to ship it as a research preview. That says something about where the team thinks model judgment is reliable and where it isn’t.

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기