MXSS: 돌연변이 교차 사이트 스크립팅 설명

hackernews | | 🔬 연구
#cross-site scripting #mxss #xss #보안 취약점 #웹 해킹 #review #보안 #웹해킹 #취약점
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

mXSS는 악의적인 코드가 초기 HTML 필터링(새니타이저) 단계에서는 일반 텍스트로 무해하게 보이지만, 브라우저가 최종 화면에 출력하기 위해 HTML을 다시 파싱하는 과정에서 구조가 변형되어 공격 코드로 실행되는 교차 사이트 스크립팅 기법입니다. 공격자는 주로 폼(form) 태그나 noscript 태그 등을 활용하여 새니타이저와 브라우저 간의 HTML 파싱 규칙 차이를 의도적으로 유발합니다. 특히 개발자가 필터링된 결과물을 클라이언트로 전달하기 전에 불필요한 태그 수정이나 문맥 변경을 가하는 '역새니타이제이션(Desanitization)' 과정은 이러한 보안 우회의 핵심적인 원인이 됩니다. 결국 복잡한 HTML 파싱 메커니즘과 렌더링 환경의 차이로 인해 발생하는 이 취약점을 근본적으로 해결하기 위해서는 개발자가 문맥을 인지하지 못하는 기존 외부 라이브러리 방식을 넘어 브라우저에 내장된 전용 새니타이저 구현에 의존해야 합니다.

본문

Mutation Cross-Site Scripting (mXSS) is a sophisticated variation of the well-known Cross-Site Scripting (XSS) vulnerability. When an application needs to safely render the user’s input as HTML, to support some HTML features, sanitization would be the solution. Allowing specific tags and attributes while stripping or encoding others. Unfortunately, this is not a straightforward task since HTML is a syntax-tolerant language that might change or “mutate” when parsing. mXSS takes advantage of that by providing a payload that seems innocent initially when parsing (during the sanitization process) but mutates it to a malicious one when re-parsing it (in the final stage of displaying the content). The abstract idea is to figure out a way for a malicious string containing an XSS vector to be rendered as raw text in the sanitizer but parsed as HTML when passed to the browser. There are several ways that could cause this behavior: mXSS round trips occur due to the fact that HTML content might change if reparsed. As written in the HTML spec: “It is possible that the output of this algorithm if parsed with an HTML parser, will not return the original tree structure. Tree structures that do not roundtrip a serialize and reparse step can also be produced by the HTML parser itself, although such cases are typically non-conforming.” Using a form element, we can demonstrate this by the following example: A form element cannot have another form nested inside of it “Content model: Flow content, but with no form element descendants.“ As demonstrated in the HTML spec the following string will initially create a DOM tree with nested forms. But when serialized and reparsed the nested form will get omitted: html ├── head └── body └── form id="outer" └── div ├── form id="inner" └── input The input element will be associated with the inner form element. Now, if this tree structure is serialized and reparsed, the start tag will be ignored, and so the input element will be associated with the outer form element instead. html ├── head └── body └── form id="outer" └── div └── input This type of mXSS takes advantage of a mismatch between the sanitizer’s parsing algorithm vs the renderer’s (e.g. browser) one. Let’s take for example the noscript element, the parsing rule for it is: “If the scripting flag is enabled, switch the tokenizer to the RAWTEXT state. Otherwise, leave the tokenizer in the data state.” (link) Meaning, that depending on whether JavaScript is disabled or enabled the body of the noscript element is rendered differently. It is logical that JavaScript would not be enabled in the sanitizer stage but will be in the renderer. This behavior is not wrong by definition, but could cause bypasses as such: There is potential for many other parser differentials that might occur, such as different HTML versions, content type mismatch, and more… Desanitization is a crucial mistake made by applications when interfering with the sanitizer’s output before sending it to the client, essentially undoing the work of the sanitizer. Any small change in the markup might cause different behavior of the parser resulting in a bypass of the sanitization. We’ve discussed this issue before in several blog posts, where we identified vulnerabilities in various applications, including: HTML parsing is complex and can be different depending on the context, for example parsing a whole document vs parsing a fragment is different in Firefox (see Browser Specific section in the main page). Dealing with the change from sanitizing to rendering in the browser, developers might mistakenly change the context in which the data is rendered causing parsing differential and eventually bypassing the sanitizer. Modern sanitizers are not aware of the context in which the result will be put, this aimed to be solved when browsers implement built-in sanitizer.

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →