Htmlparser 得到替换后的文本

最新推荐文章于 2021-06-03 03:14:35 发布

原创最新推荐文章于 2021-06-03 03:14:35 发布 · 188 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#HTML

本文介绍了一个用于批量替换HTML文件中所有链接的方法，并提供了具体的Java实现代码。通过使用Htmlparser库，该方法能够有效地遍历HTML文档中的每个链接并进行替换。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

需求:替换HTML文件内的全部链接然后得到替换后的HTML文件

考虑采用Htmlparser解析

代码如下

public class DoReplaceHtmlHref implements Callable<String> {

		private String content;

		public DoReplaceHtmlHref(String content) {
			this.content = content;
		}

		public String call() throws Exception {
			Parser myParser = new Parser();
			StringBuffer sbContent = new StringBuffer();

			try {
				myParser.setInputHTML(content);

				//得到页面的所的节点集合
				NodeList nodes = myParser
						.extractAllNodesThatMatch(new NodeFilter() {
							public boolean accept(Node node) {
								return true;
							}
						});

				for (int i = 0; i < nodes.size(); i++) {
					Node node = nodes.elementAt(i);
					//如果为链接节点
					if (node instanceof LinkTag) {
						LinkTag linkTag = (LinkTag) node;
						//设置此链接节点的内容
						sbContent.append("<a href=www.163.com>");			} else if (node instanceof TextNode) {
						//如果为文本节点直接获取内容
						TextNode text = (TextNode) node;
						sbContent.append(text.getText());
					} else {
						//如果为其他节点在文本两端加上<>
						sbContent.append('<');
						sbContent.append(node.getText());
						sbContent.append('>');
					}
				}
			} catch (Exception e) {
				log.error("parse html enode is error");
			}
			return sbContent.toString();
		}

	}