WebClient 是一个类似虚拟浏览器的网页抓取包,一个主要特点是适合动态页面的抓取,如Javascript动态生成的网页(Jsoup好像就做不了了)。
首先要引入包,主要是htmlunit,不过这东东包比较散,要运行还要引入一大堆的包,如下:
以下是一个简单的应用例子:
package j2seTest2;
import java.net.URL;
import com.gargoylesoftware.htmlunit.JavaScriptPage;
import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController;
import com.gargoylesoftware.htmlunit.Page;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlDivision;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class WebClientTest {
public static void main(String[] args) {
// final Str