Parsing a body fragment

本文介绍如何使用Jsoup.parseBodyFragment(Stringhtml)方法解析HTML片段,并强调了在接收用户输入时避免跨站点脚本攻击的重要性。文章还提供了清理输入的建议方法,确保网页的安全性和用户体验。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

[size=large]Problem(问题)[/size]
You have a fragment of body HTML (你有一个html片段)(e.g. a div containing a couple of p tags; as opposed to a full HTML document (不是完整的html)) that you want to parse(需要你解析). Perhaps it was provided by a user submitting a comment(也许这是用户提交的评论), or editing the body of a page in a CMS(或编辑一个页面的主体在CMS).

[size=large]Solution(解决)[/size]
Use the Jsoup.parseBodyFragment(String html) method(你可以使用Jsoup.parseBodyFragment(String html) 这个方法).

String html = "<div><p>Lorem ipsum.</p>";
Document doc = Jsoup.parseBodyFragment(html);
Element body = doc.body();

[size=large]Description(描述)[/size]
The parseBodyFragment method creates an empty shell document(parseBodyFragment的方法会创建一个空的文档), and inserts the parsed HTML into the body element(和插入解析HTML body元素的). If you used the normal Jsoup.parse(String html) method(你是用正常的Jsoup.parse(String html)), you would generally get the same result(你通常会得到同样的结果), but explicitly treating the input as a body fragment ensures that any bozo HTML provided by the user is parsed into the body element(但显式地处理输入,作为主体片段确保任何用户提供的HTML body元素被解析).

The Document.body() method retrieves the element children of the document's body element(Document.body()方法检索子元素body); it is equivalent to doc.getElementsByTag("body")(它相当于doc.getElementsByTag("body")).

[size=large]Stay safe[/size]
If you are going to accept HTML input from a user(如果你要接受HTML用户输入), you need to be careful to avoid cross-site scripting attacks. See the documentation for the Whitelist based cleaner(你需要小心避免跨站点脚本攻击。请参阅文档的白名单基于清洁), and clean the input with clean(和过滤输入)(String bodyHtml, Whitelist whitelist(字符串html和白名单)).
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值