基于Jsoup获取页面的header map,title和keywords

最新推荐文章于 2023-06-07 10:00:41 发布

最新推荐文章于 2023-06-07 10:00:41 发布 · 1.2k 阅读

文章标签：

#爬虫 #java #移动开发

Java 同时被 2 个专栏收录

117 篇文章

订阅专栏

Spider

14 篇文章

订阅专栏

本文介绍如何使用Java和Jsoup库来抓取网页的标题和关键词元信息。通过具体代码示例展示了依赖配置及抓取过程。

需求：

需要采集页面的title和keyword

实现：

依赖：

<dependency>
	<groupId>org.jsoup</groupId>
	<artifactId>jsoup</artifactId>
	<version>1.6.3</version>
</dependency>

代码：

Connection.Response response = connection.execute();
Map<String, String> headerMap= response.headers();
String body = response.body();
Document document = Jsoup.parse(body);
String title = document.head().select("title").text();
String keywords = document.head().select("meta[name=keywords]").attr("content");