小说爬虫之JAVA代码的实现(附代码)

http://blog.youkuaiyun.com/dapenghehe/article/details/45366395

第一次采用Markdown看看效果。

  • 思路:首先找到一篇小说,获取第一章小说的URL,然后根据该URL来获取该章小说的标题、内容和下一章的URL。之后重复类似动作,就能获取到整篇小说的内容了。

  • 实现方法:这里语言采用==Java==,使用了jsoup。jsoup简单的使用方法可以参考这里

  • 实现过程:首先找到一篇小说,这里以“神墓”为例,我们打开第一章,然后查看网页源代码。 
    在源码中我们可以看到下一页的url、文章标题和小说内容。我们可以先获取一个Document对象,然后分别根据Element的Id或者样式来获取小说的内容、标题或者下一页URL。

    1. 小说中的下一页地址 
      下一页
    2. 小说的标题 
      小说标题
    3. 小说的内容 
      小说内容
    4. 小说的下一页地址也可以从这里获取 
      其他
  • 实现代码:新建一个项目spider,导入jsoup-1.7.3.jar包。

    1. 新建com.dapeng.bean包,在该包下新建类Article。其主要代码如下:
<code class="language-java hljs  has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">package</span> com.dapeng.bean;

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">Article</span> {</span>

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String id;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//id</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String title;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//标题</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String content;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//内容</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String url;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//当前章节url</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String nextUrl;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//下一章url</span>
    <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
    *省略getter、setter方法
    */</span>
    <span class="hljs-annotation" style="color: rgb(155, 133, 157); box-sizing: border-box;">@Override</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> String <span class="hljs-title" style="box-sizing: border-box;">toString</span>() {
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Article [id="</span> + id + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", title="</span> + title + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", content="</span>
                + content + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", url="</span> + url + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", nextUrl="</span> + nextUrl + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"]"</span>;
    }
}</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li></ul>

2.新建com.dapeng.method包,在该包下新建UtilMethod类。主要用于实现获取文章标题、内容、下一章URL等方法。其代码如下:

<code class="language-java hljs  has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">package</span> com.dapeng.method;

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> java.io.IOException;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> java.util.regex.Matcher;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> java.util.regex.Pattern;

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.jsoup.Jsoup;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.jsoup.nodes.Document;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.jsoup.nodes.Element;

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> com.dapeng.bean.Article;

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">UtilMethod</span> {</span>

    <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
     * 根据url获取Document对象
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> url 小说章节url
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> Document对象
     */</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> Document <span class="hljs-title" style="box-sizing: border-box;">getDocument</span>(String url){
        Document doc = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>;
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">try</span> {
            doc = Jsoup.connect(url).timeout(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5000</span>).get();
        } <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">catch</span> (IOException e) {
            <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// TODO Auto-generated catch block</span>
            e.printStackTrace();
        }
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> doc;
    }
    <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
     * 根据获取的Document对象找到章节标题
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> doc
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> 标题
     */</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> String <span class="hljs-title" style="box-sizing: border-box;">getTitle</span>(Document doc){
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> doc.getElementById(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"title"</span>).text();
    }

    <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
     * 根据获取的Document对象找到小说内容
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> doc
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> 内容
     */</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> String <span class="hljs-title" style="box-sizing: border-box;">getContent</span>(Document doc){
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span>(doc.getElementById(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"content"</span>) != <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>){
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> doc.getElementById(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"content"</span>).text();
        }<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span>{
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>;
        }

    }
    <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
     * 根据获取的Document对象找到下一章的Url地址
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> doc
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> 下一章Url
     */</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> String <span class="hljs-title" style="box-sizing: border-box;">getNextUrl</span>(Document doc){
        Element ul = doc.select(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"ul"</span>).first();
        String regex = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"<li><a href=\"(.*?)\">下一页<\\/a><\\/li>"</span>;
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(ul.toString());
        Document nextDoc = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>;
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (matcher.find()) {
            nextDoc = Jsoup.parse(matcher.group());
            Element href = nextDoc.select(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"a"</span>).first();
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"http://www.bxwx.org/b/5/5131/"</span> + href.attr(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"href"</span>);
        }<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span>{
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>;
        }


    }

    <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
     * 根据url获取id
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> url
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> id 
     */</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> String <span class="hljs-title" style="box-sizing: border-box;">getId</span>(String url){
        String urlSpilts[] = url.split(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/"</span>);
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> (urlSpilts[urlSpilts.length - <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]).split(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"\\."</span>)[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>];
    }

    <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
     * 根据小说的Url获取一个Article对象
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> url
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span>
     */</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> Article <span class="hljs-title" style="box-sizing: border-box;">getArticle</span>(String url){
        Article article = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> Article();        
        article.setUrl(url);
        Document doc = getDocument(url);
        article.setId(getId(url));
        article.setTitle(getTitle(doc));
        article.setNextUrl(getNextUrl(doc));
        article.setContent(getContent(doc));
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> article;
    }


}
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li><li style="box-sizing: border-box; padding: 0px 5px;">93</li><li style="box-sizing: border-box; padding: 0px 5px;">94</li><li style="box-sizing: border-box; padding: 0px 5px;">95</li><li style="box-sizing: border-box; padding: 0px 5px;">96</li><li style="box-sizing: border-box; padding: 0px 5px;">97</li><li style="box-sizing: border-box; padding: 0px 5px;">98</li><li style="box-sizing: border-box; padding: 0px 5px;">99</li><li style="box-sizing: border-box; padding: 0px 5px;">100</li><li style="box-sizing: border-box; padding: 0px 5px;">101</li><li style="box-sizing: border-box; padding: 0px 5px;">102</li></ul>

3.新建com.dapeng.test包,用于测试获取整篇小说。新建GetArticles类。其代码如下:

<code class="language-java hljs  has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">package</span> com.dapeng.test;

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> com.dapeng.bean.Article;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> com.dapeng.method.UtilMethod;

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">GetArticles</span> {</span>

    <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
     *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> args
     */</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">void</span> <span class="hljs-title" style="box-sizing: border-box;">main</span>(String[] args) {
        <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// TODO Auto-generated method stub</span>

        String firstUrl = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"http://www.bxwx.org/b/5/5131/832882.html"</span>;
        Article article = UtilMethod.getArticle(firstUrl);
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">while</span>(article.getNextUrl() != <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span> && article.getContent() != <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span> && !article.getId().equals(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"996627"</span>)){
            article = UtilMethod.getArticle(article.getNextUrl());
            System.out.println(article.getId()+<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"----"</span>+article.getTitle());
        }
    }

}
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li></ul>

运行GetArticles方法,可以看到类似如下的效果:

<code class="hljs brainfuck has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832883</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第二章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">沧海桑田</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832884</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第三章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">人世悠悠</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832885</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第四章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">无解之谜</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832886</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第五章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">惊艳</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832887</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第六章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">公主</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832888</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第七章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">小恶魔</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832889</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第八章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">烈火仙莲</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832890</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第九章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">百丈巨蛇</span>
<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">.</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">.</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">.</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">.</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li></ul>

到这里该代码就告一段落了。后续的就不再进行了,比如说将小说插入数据库,然后自己搭建一个小说站点,或者将小说写入到一个txt文档中,不用再看那烦人的广告了。等等。。这里就不再进行了。

最后代码附上。


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值