有时候调取接口的,返回一个HTML页面,我们后台拿到HTML页面字符串,需要解析发给前台使用,比如一个地址,或者HTML字符串里面的相关内容。
- 获取jsoup的jar包,可以去maven仓库中下载,下载地址:
jsoup仓库下载地址
解析案例如下:
@Test
public void test5() {
String html = "<html><head><title>302 Moved Temporarily</title></head>\n" +
"<body bgcolor=\"#FFFFFF\">\n" +
"<p>This document you requested has moved \n" +
"temporarily.</p>\n" +
"<p>It's now at <a href=\"https://pay.ceibs.edu/paycenter/dopay/selectprovider?u=4f54d017-8605-4c64-8367-96a38257a92e\">https://pay.ceibs.edu/paycenter/dopay/selectprovider?u=4f54d017-8605-4c64-8367-96a38257a92e</a>.</p>\n" +
"</body></html>";
Document doc = Jsoup.parse(html);
System.out.println(doc); // 输出带标签的html文档
System.out.println("---------------------\n"+doc.text()); // 输出内容
Elements element = doc.getElementsByTag("a");
System.out.println("---------------------\n"+element.text());
}
结果如下:
<html>
<head>
<title>302 Moved Temporarily</title>
</head>
<body bgcolor="#FFFFFF">
<p>This document you requested has moved temporarily.</p>
<p>It's now at <a href="https://pay.ceibs.edu/paycenter/dopay/selectprovider?u=4f54d017-8605-4c64-8367-96a38257a92e">https://pay.ceibs.edu/paycenter/dopay/selectprovider?u=4f54d017-8605-4c64-8367-96a38257a92e</a>.</p>
</body>
</html>
---------------------
302 Moved Temporarily This document you requested has moved temporarily. It's now at https://pay.ceibs.edu/paycenter/dopay/selectprovider?u=4f54d017-8605-4c64-8367-96a38257a92e.
---------------------
https://pay.ceibs.edu/paycenter/dopay/selectprovider?u=4f54d017-8605-4c64-8367-96a38257a92e
更加详实案例如下:
详细案例