http://blog.youkuaiyun.com/dapenghehe/article/details/45366395
第一次采用Markdown看看效果。
-
思路:首先找到一篇小说,获取第一章小说的URL,然后根据该URL来获取该章小说的标题、内容和下一章的URL。之后重复类似动作,就能获取到整篇小说的内容了。
-
实现过程:首先找到一篇小说,这里以“神墓”为例,我们打开第一章,然后查看网页源代码。
在源码中我们可以看到下一页的url、文章标题和小说内容。我们可以先获取一个Document对象,然后分别根据Element的Id或者样式来获取小说的内容、标题或者下一页URL。- 小说中的下一页地址
- 小说的标题
- 小说的内容
- 小说的下一页地址也可以从这里获取
- 小说中的下一页地址
-
实现代码:新建一个项目spider,导入jsoup-1.7.3.jar包。
- 新建com.dapeng.bean包,在该包下新建类Article。其主要代码如下:
<code class="language-java hljs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">package</span> com.dapeng.bean; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">Article</span> {</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String id;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//id</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String title;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//标题</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String content;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//内容</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String url;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//当前章节url</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String nextUrl;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//下一章url</span> <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/** *省略getter、setter方法 */</span> <span class="hljs-annotation" style="color: rgb(155, 133, 157); box-sizing: border-box;">@Override</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> String <span class="hljs-title" style="box-sizing: border-box;">toString</span>() { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Article [id="</span> + id + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", title="</span> + title + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", content="</span> + content + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", url="</span> + url + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", nextUrl="</span> + nextUrl + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"]"</span>; } }</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li></ul>
2.新建com.dapeng.method包,在该包下新建UtilMethod类。主要用于实现获取文章标题、内容、下一章URL等方法。其代码如下:
<code class="language-java hljs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">package</span> com.dapeng.method; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> java.io.IOException; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> java.util.regex.Matcher; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> java.util.regex.Pattern; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.jsoup.Jsoup; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.jsoup.nodes.Document; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.jsoup.nodes.Element; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> com.dapeng.bean.Article; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">UtilMethod</span> {</span> <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/** * 根据url获取Document对象 *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> url 小说章节url *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> Document对象 */</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> Document <span class="hljs-title" style="box-sizing: border-box;">getDocument</span>(String url){ Document doc = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">try</span> { doc = Jsoup.connect(url).timeout(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5000</span>).get(); } <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">catch</span> (IOException e) { <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// TODO Auto-generated catch block</span> e.printStackTrace(); } <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> doc; } <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/** * 根据获取的Document对象找到章节标题 *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> doc *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> 标题 */</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> String <span class="hljs-title" style="box-sizing: border-box;">getTitle</span>(Document doc){ <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> doc.getElementById(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"title"</span>).text(); } <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/** * 根据获取的Document对象找到小说内容 *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> doc *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> 内容 */</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> String <span class="hljs-title" style="box-sizing: border-box;">getContent</span>(Document doc){ <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span>(doc.getElementById(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"content"</span>) != <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>){ <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> doc.getElementById(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"content"</span>).text(); }<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span>{ <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>; } } <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/** * 根据获取的Document对象找到下一章的Url地址 *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> doc *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> 下一章Url */</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> String <span class="hljs-title" style="box-sizing: border-box;">getNextUrl</span>(Document doc){ Element ul = doc.select(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"ul"</span>).first(); String regex = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"<li><a href=\"(.*?)\">下一页<\\/a><\\/li>"</span>; Pattern pattern = Pattern.compile(regex); Matcher matcher = pattern.matcher(ul.toString()); Document nextDoc = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (matcher.find()) { nextDoc = Jsoup.parse(matcher.group()); Element href = nextDoc.select(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"a"</span>).first(); <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"http://www.bxwx.org/b/5/5131/"</span> + href.attr(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"href"</span>); }<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span>{ <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>; } } <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/** * 根据url获取id *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> url *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> id */</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> String <span class="hljs-title" style="box-sizing: border-box;">getId</span>(String url){ String urlSpilts[] = url.split(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/"</span>); <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> (urlSpilts[urlSpilts.length - <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]).split(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"\\."</span>)[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]; } <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/** * 根据小说的Url获取一个Article对象 *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> url *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> */</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> Article <span class="hljs-title" style="box-sizing: border-box;">getArticle</span>(String url){ Article article = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> Article(); article.setUrl(url); Document doc = getDocument(url); article.setId(getId(url)); article.setTitle(getTitle(doc)); article.setNextUrl(getNextUrl(doc)); article.setContent(getContent(doc)); <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> article; } } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li><li style="box-sizing: border-box; padding: 0px 5px;">93</li><li style="box-sizing: border-box; padding: 0px 5px;">94</li><li style="box-sizing: border-box; padding: 0px 5px;">95</li><li style="box-sizing: border-box; padding: 0px 5px;">96</li><li style="box-sizing: border-box; padding: 0px 5px;">97</li><li style="box-sizing: border-box; padding: 0px 5px;">98</li><li style="box-sizing: border-box; padding: 0px 5px;">99</li><li style="box-sizing: border-box; padding: 0px 5px;">100</li><li style="box-sizing: border-box; padding: 0px 5px;">101</li><li style="box-sizing: border-box; padding: 0px 5px;">102</li></ul>
3.新建com.dapeng.test包,用于测试获取整篇小说。新建GetArticles类。其代码如下:
<code class="language-java hljs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">package</span> com.dapeng.test; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> com.dapeng.bean.Article; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> com.dapeng.method.UtilMethod; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">GetArticles</span> {</span> <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/** *<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> args */</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">void</span> <span class="hljs-title" style="box-sizing: border-box;">main</span>(String[] args) { <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// TODO Auto-generated method stub</span> String firstUrl = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"http://www.bxwx.org/b/5/5131/832882.html"</span>; Article article = UtilMethod.getArticle(firstUrl); <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">while</span>(article.getNextUrl() != <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span> && article.getContent() != <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span> && !article.getId().equals(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"996627"</span>)){ article = UtilMethod.getArticle(article.getNextUrl()); System.out.println(article.getId()+<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"----"</span>+article.getTitle()); } } } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li></ul>
运行GetArticles方法,可以看到类似如下的效果:
<code class="hljs brainfuck has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832883</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第二章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">沧海桑田</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832884</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第三章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">人世悠悠</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832885</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第四章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">无解之谜</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832886</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第五章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">惊艳</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832887</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第六章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">公主</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832888</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第七章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">小恶魔</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832889</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第八章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">烈火仙莲</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832890</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第九章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">百丈巨蛇</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">.</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">.</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">.</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">.</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li></ul>
到这里该代码就告一段落了。后续的就不再进行了,比如说将小说插入数据库,然后自己搭建一个小说站点,或者将小说写入到一个txt文档中,不用再看那烦人的广告了。等等。。这里就不再进行了。
最后代码附上。