http://blog.youkuaiyun.com/dapenghehe/article/details/45366395
第一次采用Markdown看看效果。
-
思路:首先找到一篇小说,获取第一章小说的URL,然后根据该URL来获取该章小说的标题、内容和下一章的URL。之后重复类似动作,就能获取到整篇小说的内容了。
-
实现过程:首先找到一篇小说,这里以“神墓”为例,我们打开第一章,然后查看网页源代码。
在源码中我们可以看到下一页的url、文章标题和小说内容。我们可以先获取一个Document对象,然后分别根据Element的Id或者样式来获取小说的内容、标题或者下一页URL。- 小说中的下一页地址
- 小说的标题
- 小说的内容
- 小说的下一页地址也可以从这里获取
- 小说中的下一页地址
-
实现代码:新建一个项目spider,导入jsoup-1.7.3.jar包。
- 新建com.dapeng.bean包,在该包下新建类Article。其主要代码如下:
<code class="language-java hljs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">package</span> com.dapeng.bean;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">Article</span> {</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String id;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//id</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String title;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//标题</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String content;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//内容</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String url;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//当前章节url</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String nextUrl;<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//下一章url</span>
<span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
*省略getter、setter方法
*/</span>
<span class="hljs-annotation" style="color: rgb(155, 133, 157); box-sizing: border-box;">@Override</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> String <span class="hljs-title" style="box-sizing: border-box;">toString</span>() {
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Article [id="</span> + id + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", title="</span> + title + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", content="</span>
+ content + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", url="</span> + url + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", nextUrl="</span> + nextUrl + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"]"</span>;
}
}</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li></ul>
2.新建com.dapeng.method包,在该包下新建UtilMethod类。主要用于实现获取文章标题、内容、下一章URL等方法。其代码如下:
<code class="language-java hljs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">package</span> com.dapeng.method;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> java.io.IOException;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> java.util.regex.Matcher;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> java.util.regex.Pattern;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.jsoup.Jsoup;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.jsoup.nodes.Document;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.jsoup.nodes.Element;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> com.dapeng.bean.Article;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">UtilMethod</span> {</span>
<span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
* 根据url获取Document对象
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> url 小说章节url
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> Document对象
*/</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> Document <span class="hljs-title" style="box-sizing: border-box;">getDocument</span>(String url){
Document doc = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">try</span> {
doc = Jsoup.connect(url).timeout(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5000</span>).get();
} <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">catch</span> (IOException e) {
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// TODO Auto-generated catch block</span>
e.printStackTrace();
}
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> doc;
}
<span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
* 根据获取的Document对象找到章节标题
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> doc
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> 标题
*/</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> String <span class="hljs-title" style="box-sizing: border-box;">getTitle</span>(Document doc){
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> doc.getElementById(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"title"</span>).text();
}
<span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
* 根据获取的Document对象找到小说内容
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> doc
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> 内容
*/</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> String <span class="hljs-title" style="box-sizing: border-box;">getContent</span>(Document doc){
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span>(doc.getElementById(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"content"</span>) != <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>){
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> doc.getElementById(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"content"</span>).text();
}<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span>{
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>;
}
}
<span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
* 根据获取的Document对象找到下一章的Url地址
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> doc
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> 下一章Url
*/</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> String <span class="hljs-title" style="box-sizing: border-box;">getNextUrl</span>(Document doc){
Element ul = doc.select(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"ul"</span>).first();
String regex = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"<li><a href=\"(.*?)\">下一页<\\/a><\\/li>"</span>;
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(ul.toString());
Document nextDoc = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (matcher.find()) {
nextDoc = Jsoup.parse(matcher.group());
Element href = nextDoc.select(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"a"</span>).first();
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"http://www.bxwx.org/b/5/5131/"</span> + href.attr(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"href"</span>);
}<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span>{
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span>;
}
}
<span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
* 根据url获取id
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> url
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span> id
*/</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> String <span class="hljs-title" style="box-sizing: border-box;">getId</span>(String url){
String urlSpilts[] = url.split(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/"</span>);
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> (urlSpilts[urlSpilts.length - <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]).split(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"\\."</span>)[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>];
}
<span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
* 根据小说的Url获取一个Article对象
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> url
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @return</span>
*/</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> Article <span class="hljs-title" style="box-sizing: border-box;">getArticle</span>(String url){
Article article = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> Article();
article.setUrl(url);
Document doc = getDocument(url);
article.setId(getId(url));
article.setTitle(getTitle(doc));
article.setNextUrl(getNextUrl(doc));
article.setContent(getContent(doc));
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> article;
}
}
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li><li style="box-sizing: border-box; padding: 0px 5px;">93</li><li style="box-sizing: border-box; padding: 0px 5px;">94</li><li style="box-sizing: border-box; padding: 0px 5px;">95</li><li style="box-sizing: border-box; padding: 0px 5px;">96</li><li style="box-sizing: border-box; padding: 0px 5px;">97</li><li style="box-sizing: border-box; padding: 0px 5px;">98</li><li style="box-sizing: border-box; padding: 0px 5px;">99</li><li style="box-sizing: border-box; padding: 0px 5px;">100</li><li style="box-sizing: border-box; padding: 0px 5px;">101</li><li style="box-sizing: border-box; padding: 0px 5px;">102</li></ul>
3.新建com.dapeng.test包,用于测试获取整篇小说。新建GetArticles类。其代码如下:
<code class="language-java hljs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">package</span> com.dapeng.test;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> com.dapeng.bean.Article;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> com.dapeng.method.UtilMethod;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">GetArticles</span> {</span>
<span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
*<span class="hljs-javadoctag" style="color: rgb(102, 0, 102); box-sizing: border-box;"> @param</span> args
*/</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">void</span> <span class="hljs-title" style="box-sizing: border-box;">main</span>(String[] args) {
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// TODO Auto-generated method stub</span>
String firstUrl = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"http://www.bxwx.org/b/5/5131/832882.html"</span>;
Article article = UtilMethod.getArticle(firstUrl);
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">while</span>(article.getNextUrl() != <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span> && article.getContent() != <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">null</span> && !article.getId().equals(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"996627"</span>)){
article = UtilMethod.getArticle(article.getNextUrl());
System.out.println(article.getId()+<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"----"</span>+article.getTitle());
}
}
}
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li></ul>
运行GetArticles方法,可以看到类似如下的效果:
<code class="hljs brainfuck has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832883</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第二章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">沧海桑田</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832884</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第三章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">人世悠悠</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832885</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第四章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">无解之谜</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832886</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第五章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">惊艳</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832887</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第六章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">公主</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832888</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第七章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">小恶魔</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832889</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第八章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">烈火仙莲</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">832890</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">-</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第一卷</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">从神墓中走出</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">第九章</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">百丈巨蛇</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">.</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">.</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">.</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">.</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li></ul>
到这里该代码就告一段落了。后续的就不再进行了,比如说将小说插入数据库,然后自己搭建一个小说站点,或者将小说写入到一个txt文档中,不用再看那烦人的广告了。等等。。这里就不再进行了。
最后代码附上。
5016

被折叠的 条评论
为什么被折叠?



