Parsing a body fragment

最新推荐文章于 2024-01-05 17:42:49 发布

原创最新推荐文章于 2024-01-05 17:42:49 发布 · 218 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#shell

jsoup官方教程专栏收录该内容

5 篇文章

订阅专栏

本文介绍如何使用Jsoup.parseBodyFragment(Stringhtml)方法解析HTML片段，并强调了在接收用户输入时避免跨站点脚本攻击的重要性。文章还提供了清理输入的建议方法，确保网页的安全性和用户体验。

[size=large]Problem(问题)[/size]
You have a fragment of body HTML (你有一个html片段)(e.g. a div containing a couple of p tags; as opposed to a full HTML document (不是完整的html)) that you want to parse(需要你解析). Perhaps it was provided by a user submitting a comment(也许这是用户提交的评论), or editing the body of a page in a CMS(或编辑一个页面的主体在CMS).

[size=large]Solution(解决)[/size]
Use the Jsoup.parseBodyFragment(String html) method（你可以使用Jsoup.parseBodyFragment(String html) 这个方法）.

String html = "<div><p>Lorem ipsum.</p>";
Document doc = Jsoup.parseBodyFragment(html);
Element body = doc.body();

[size=large]Description(描述)[/size]
The parseBodyFragment method creates an empty shell document(parseBodyFragment的方法会创建一个空的文档), and inserts the parsed HTML into the body element（和插入解析HTML body元素的）. If you used the normal Jsoup.parse(String html) method(你是用正常的Jsoup.parse(String html)), you would generally get the same result(你通常会得到同样的结果), but explicitly treating the input as a body fragment ensures that any bozo HTML provided by the user is parsed into the body element（但显式地处理输入,作为主体片段确保任何用户提供的HTML body元素被解析）.

The Document.body() method retrieves the element children of the document's body element(Document.body()方法检索子元素body); it is equivalent to doc.getElementsByTag("body")(它相当于doc.getElementsByTag("body")).

[size=large]Stay safe[/size]
If you are going to accept HTML input from a user(如果你要接受HTML用户输入), you need to be careful to avoid cross-site scripting attacks. See the documentation for the Whitelist based cleaner（你需要小心避免跨站点脚本攻击。请参阅文档的白名单基于清洁）, and clean the input with clean(和过滤输入)(String bodyHtml, Whitelist whitelist(字符串html和白名单)).