1. 到 GitHub 下载 TFHpple 框架源码,地址:点击打开链接
2. 将下载到源码中的以下6个文件添加到项目中:
3.在项目中添加 libxml2.2.dylib 依赖框架,并且修改项目中的 “header search path”,如图:
4. 接下来就可以开始编写解析HTML代码了。
目标HTML:
<table cellpadding="4" width="98%" border="1" class="tb">
<thead class="tbhead">
<tr>
<td width="5%">序号<input type="checkbox" title="全选/全不选" name="searchresult_all_cb" id="searchresult_all_cb" onclick="chooseall(this);"/></td>
<td width="33%">题名</td>
<td width="19%">责任者</td>
<td width="13%">出版者</td>
<td width="7%">出版年</td>
<td width="7%">索取号</td>
<td width="5%">馆藏</td>
<td width="5%">可借</td>
<td width="6%">相关资源</td>
</tr>
</thead>
<tbody>
<tr>
<td><input type="checkbox" name="searchresult_cb" value="443746" onclick="savethis(this);"/>1</td>
<td><span class="title"><a href="bookinfo.aspx?ctrlno=443746" target="_blank">iOS 6开发进阶与实战 [专著]=More iOS 6 development:further explorations of iOS SDK</a></span></td>
<td>(美)马克(Dave Mark) ... [等] 著;麦秆创智译</td>
<td>人民邮电出版社</td>
<td>2013.10</td>
<td class="tbr">TN929.53/M150</td>
<td class="tbr">3</td>
<td class="tbr">2</td>
<td>
</td>
</tr>
<tr>
<td><input type="checkbox" name="searchresult_cb" value="443708" onclick="savethis(this);"/>2</td>
<td><span class="title"><a href="bookinfo.aspx?ctrlno=443708" target="_blank">iOS网络编程与云端应用最佳实践 [专著]=iOS developing insights: network and icloud</a></span></td>
<td>关东升著</td>
<td>清华大学出版社</td>
<td>2013</td>
<td class="tbr">TN929.53/G776</td>
<td class="tbr">3</td>
<td class="tbr">2</td>
<td>
</td>
</tr>
解析HTML的代码:
- (void)parseSearchResult: (NSData *)result
{
TFHpple *doc = [TFHpple hppleWithHTMLData:result];
// 读取 <tr></tr> 标签里面的内容
NSArray *TRElements = [doc searchWithXPathQuery:@"//tr"];
int i = 0;
for (TFHppleElement *tempTRElement in TRElements) {
//放弃读取第一个 <tr></tr> 标签里面的内容
if (i == 0) {
i++;
continue;
}
// 读取 <td></td> 标签里面的内容
NSArray *TDElements = [tempTRElement childrenWithTagName:@"td"];
for (TFHppleElement *tempTDElement in TDElements) {
if ([tempTDElement text] != nil) {
// 读取 <td>xxx</td> 标签里面包含的内容
NSLog(@"%@", [tempTDElement text]);
}
// 读取 <a></a> 里面的内容
NSArray *AElements = [tempTDElement searchWithXPathQuery:@"//a"];
for (TFHppleElement *tempAElement in AElements) {
// 读取 <a href="xxx"></a> 标签里面的 href 属性的值
NSLog(@"A-href:%@", [tempAElement objectForKey:@"href"]);
// 读取 <a>xxx</a> 标签里面包含的内容
NSLog(@"A-text:%@", [tempAElement text]);
}
}
}
}
TFHpple github 上的用法示例:
#import "TFHpple.h"
NSData * data = [NSData dataWithContentsOfFile:@"index.html"];
TFHpple * doc = [[TFHpple alloc] initWithHTMLData:data];
NSArray * elements = [doc search:@"//a[@class='sponsor']"];
TFHppleElement * e = [elements objectAtIndex:0];
[e text]; // The text inside the HTML element (the content of the first text node)
[e tagName]; // "a"
[e attributes]; // NSDictionary of href, class, id, etc.
[e objectForKey:@"href"]; // Easy access to single attribute
[e firstChildWithTagName:@"b"]; // The first "b" child node