Problem
You want to allow untrusted users to supply HTML for output on your website (e.g. as comment submission). You need to clean this HTML to avoid cross-site scripting (XSS) attacks.
Solution
Use the jsoup HTML Cleaner with a configuration specified by a Whitelist.
String unsafe =
"<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>";
String safe = Jsoup.clean(unsafe, Whitelist.basic());
// now: <p><a href="http://example.com/" rel="nofollow">Link</a></p>
Discussion
A cross-site scripting attack against your site can really ruin your day, not to mention your users'. Many sites avoid XSS attacks by not allowing HTML in user submitted content: they enforce plain text only, or use an alternative markup syntax like wiki-text or Markdown. These are seldom optimal solutions for the user, as they lower expressiveness, and force the user to learn a new syntax.
A better solution may be to use a rich text WYSIWYG editor (like CKEditor or TinyMCE). These output HTML, and allow the user to work visually. However, their validation is done on the client side: you need to apply a server-side validation to clean up the input and ensure the HTML is safe to place on your site. Otherwise, an attacker can avoid the client-side Javascript validation and inject unsafe HMTL directly into your site
The jsoup whitelist sanitizer works by parsing the input HTML (in a safe, sand-boxed environment), and then iterating through the parse tree and only allowing known-safe tags and attributes (and values) through into the cleaned output.
It does not use regular expressions, which are inappropriate for this task.
jsoup provides a range of Whitelist configurations to suit most requirements; they can be modified if necessary, but take care.
The cleaner is useful not only for avoiding XSS, but also in limiting the range of elements the user can provide: you may be OK with textual a, strong elements, but not structural div ortable elements.
See also
See the XSS cheat sheet and filter evasion guide, as an example of how regular-expression filters don't work, and why a safe whitelist parser-based sanitizer is the correct approach.
See the
Cleanerreference if you want to get aDocumentinstead of a String returnSee the
Whitelistreference for the different canned options, and to create a custom whitelistThe nofollow link attribute
本文介绍如何使用 JSoup 的 HTML Cleaner 配合 Whitelist 来清理不可信用户提交的 HTML 内容,以防御跨站脚本 (XSS) 攻击。这种方法不仅提高了网站的安全性,还允许用户提交富文本内容。

3998

被折叠的 条评论
为什么被折叠?



