HN 표시: Redos-analyzer – Python용 정적 ReDoS 감지 및 자동 수정
hackernews
|
|
📦 오픈소스
#python
#redos
#정규식
#정적분석
#취약점
#취약점/보안
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
redos-analyzer는 파이썬 코드의 정규표현식 거부 서비스(ReDoS) 취약점을 정적 분석으로 탐지하고 자동 수정하는 도구입니다. 이 도구는 파이썬의 내부 구문 트리(sre_parse AST)를 분석하여 위험한 패턴을 찾아내고, 원본과 동일한 매칭 결과를 유지하는 안전한 패턴(예: 원자 그룹 사용)으로 자동 교정 및 검증까지 수행합니다. 특히 주변 소스 코드의 맥락을 파악해 사용자 입력 등 실제 위험도에 따라 취약점을 분류하며, 온라인 체커가 놓칠 수 있는 내부 최적화 과정의 취약점도 정확히 찾아낼 수 있습니다.
본문
redos-analyzer is a static analysis tool that detects and automatically fixes Regular Expression Denial of Service vulnerabilities in Python codebases. Unlike online checkers that test patterns by running them against adversarial inputs, this tool works by walking Python's own internal sre_parse abstract syntax tree, the same representation CPython uses when it compiles a regex, so it sees exactly what the engine sees, including optimizations, without executing anything. When a dangerous pattern is found, the tool generates a corrected version using atomic groups, verifies that the fix preserves the original matching semantics, and reports whether the fix is safe to apply. The tool was validated against the top 20 most-downloaded PyPI packages and confirmed detections against real maintainer-acknowledged issues. Most regular expression engines, including Python's re module, use a backtracking strategy to find matches. When a pattern is written in a way that allows the same input to be matched in multiple overlapping ways, the engine is forced to explore an exponential number of possible combinations before it can confirm a non-match. The time taken grows not linearly but exponentially with input length. A pattern that takes one millisecond on ten characters may take minutes on thirty. An attacker who can supply input to a web server that applies such a pattern can cause a complete service stall with a single crafted request. This is not theoretical. In July 2019, Cloudflare suffered a global outage lasting 27 minutes because a single poorly written regex in their WAF rule set triggered catastrophic backtracking under normal traffic conditions, taking their entire edge network offline. Input length 10 -> ~0.001s Input length 20 -> ~0.884s (884x slower) Input length 30 -> ~hours Input length 40 -> longer than the age of the universe The doubling of input length roughly squares the time, which is the hallmark of exponential complexity. The pattern does not need to be obviously malicious. Most ReDoS vulnerabilities are introduced accidentally by developers who have no idea the pattern they wrote is dangerous. Three things separate this tool from existing options. First, it does not just detect, it generates a corrected pattern using atomic group syntax and runs a semantic equivalence check before presenting the fix, so you are not left to figure out the repair yourself. Second, it analyzes the sre_parse AST rather than the raw pattern string, which means it sees patterns the way Python actually compiles them, including prefix factoring and other internal normalizations that change what the backtracker encounters at runtime. Online checkers and tools that operate on raw regex text can miss vulnerabilities that are only visible at the AST level, or flag patterns that are actually safe after internal optimization. Third, the exploitability classifier reads the surrounding source code context to distinguish patterns that match user-supplied input from patterns applied to trusted internal strings, so findings are graded by real-world risk rather than raw structural danger. redos-analyzer is on PyPI. Install it with: pip install redos-analyzer It works in Google Colab too: !pip install redos-analyzer No setup needed after install. Import and use immediately. The core API is analyze() , which takes a pattern string and returns a list of warnings, and suggest_fix() , which takes a pattern and a warning and returns a corrected pattern along with a verification result. from redos_analyzer import analyze, suggest_fix warnings = analyze(r'(a+)+') print(warnings[0]) # [NESTED_QUANTIFIER] # Location : col 0-5 '(a+)+' # Sub-pattern : (a+)+ # Exploitability : POSSIBLE # Detail : Repeating quantifier '+' on 'a+' is nested inside the # outer repeating quantifier. On a failing match the engine # must explore an exponential number of ways to partition # the input across the two quantifiers. fix = suggest_fix(r'(a+)+', warnings[0]) print(fix['fixed']) # (?>a+)+ print(fix['verified']) # True print(fix['semantics_preserved']) # confirmed by verify_fix() The verified field means analyze() returned no warnings on the fixed pattern. The semantics_preserved check runs both patterns against a suite of test inputs and confirms the match results are identical. | Class | Severity | What it means | Example | |---|---|---|---| NESTED_QUANTIFIER | HIGH | A repeating quantifier wraps a body that itself contains a repeating quantifier, creating an exponential number of ways to partition input | (a+)+ | NULLABLE_BRANCH_IN_QUANTIFIER | HIGH | A quantifier's alternation contains a branch that can match the empty string, allowing infinite zero-progress iterations | (a|a?)+ | OVERLAPPING_ALTERNATION | MEDIUM | Two branches inside a repeating quantifier share overlapping first characters or one literal is a prefix of another | (a|aa)+ | LIKELY_TYPO | MEDIUM | The sequence (:? appears in the pattern, which is almost certainly a typo for the non-capturin
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유