Claude Code로 보안 코드 검토 수준 향상

hackernews | 2026년 3월 31일 08:59 | 🔬 연구

#claude #claude code #review #개발자 도구 #보안 코드 리뷰 #보안 평가

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

해당 기사는 펜테스트 진행 시 익숙하지 않은 언어나 프레임워크로 작성된 애플리케이션 코드를 분석할 때 'Claude Code'를 어떻게 효과적으로 활용하는지 다룹니다. 저자는 모델에 무작정 취약점을 찾도록 지시하여 발생하는 오탐(false positive)을 줄이기 위해, 코드를 이해하고 직접 보안성을 검토하는 보조 도구로서 Claude를 활용하는 방법론을 제시합니다. 특히 보안 및 교육 역할을 부여하는 맞춤형 시스템 프롬프트를 구성하고, 고객사의 소스 코드 유출을 방지하기 위해 퍼블릭 AI 대신 로컬 모델이나 Amazon Bedrock 등 자체 통제 인프라 환경에서 실행할 것을 강력히 권고하고 있습니다.

본문

TL;DR: Claude Code is a force multiplier when performing secure code reviews during an assessment. In this post, we discuss how to leverage Claude Code to produce digestible output that helps up better understand analyzed code base while surfacing secure and insecure coding patterns. After years of manually reviewing source code for vulnerabilities during web application penetration tests, I’ve found modern LLMs are a great boon to the process. They greatly assist in scenarios where you are dropped into an application built with unfamiliar languages, frameworks, and tech stacks. A tool like Claude Code can surface security hot spots with the breadth and fidelity that pre-LLM software characterizing tooling like Microsoft’s AppInspector could only dream of. My methodology is to use Claude Code to empower my own secure code review (i.e., understand how the application works) rather than relying on it to assess and surface vulnerabilities for me. I’ve found that the latter results in a long string of “findings” that I need to parse and process, most of which end up being false positives. Instead, I use Claude in a targeted manager to simplify code review while providing its own security annotations and insight along the way that I can accept or reject as valid in the context it is presented. In this post, I will go over how I utilize Claude Code in application penetration tests to understand and analyze application code more efficiently. We will review the source code for BloodHound Community Edition (BHCE) to understand how some of its more complex tasks work, and after that, look at Elad’s BadWindowsService to see how our system prompt does with a different application type. OPSEC NOTE: I do not recommend using public Claude.ai models for any private intellectual property (IP) code. Instead, with client approval, run a local model or host your own in something like Amazon Bedrock and configure Claude Code to use that instead so that you maintain confidentiality of your customer’s code base by keeping it in your own controlled and secure infrastructure. The System Prompt Before rappelling into the depths of application code, it is beneficial to provide Claude Code with contextual information for the target application and guidance on what we want to see in a response. This will result in fewer false positives and more useful responses compared to just telling Claude to “Find vulns in this app plz”. A system prompt is the perfect place to supply this. When working with an LLM, the client application maintains a rolling, limited “memory” of your conversation, called a context window, which is used during subsequent inference (i.e., generating a response). In practice, the model doesn’t have its own memory. The entirety of this context window is transparently sent with each subsequent prompt to the model. The size of the context window varies by model, so it is important to understand that the LLM will not know any of the dropped information purged from the context window. Fortunately, the context windows used and available capacity are sometimes surfaced to us users in GUI and CLI tools. Companies are also starting to leverage long-term memories that persist outside of the context window to improve continuity, which you will see an example of below. The system prompt is typically present in every prompt as a guiding prefix in the model’s context window. I like to provide pertinent application and code base information to the model via the system prompt, such as: - The persona (i.e., role) that the model is to operate as - Response content, structure, and formatting expectations - A description of the application that includes the name, type of application, purpose, data storage in use, and a definition of its authorization model - Application URL, language, frameworks - A link to or attachment of the API spec - The full folder paths for different app components (e.g., frontend, router, controllers, views, daemons) Below is the system prompt that I use. It includes a default analysis persona that provides security context to any responses it provides, and a conditional educational persona for when I want to learn more about a coding pattern, language behavior, or framework feature. The educational persona gets verbose and is great about explaining things in the context of the application being analyzed. I’ve instructed the model to switch to the educational mode when I include [TeachMe] in the prompt. The Analysis Methodology and Response Guidance sections are where I specify how to perform analysis and how I want responses to be structured. Ultimately, I want a digestible, security-focused code flow walkthrough without needing to reference the codebase itself to fill in any blanks. This is done with code snippets and a narrative throughout the data flow walkthrough. I also want it to point out both positive observations and security control gaps to investigate further. Lastly, I want it to communicate how confi

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기