根据li标签 查找class="alcw4 alcw41"对应的值

本文介绍了一个使用Perl进行网页解析的具体实例。通过运用LWP::UserAgent模块获取网页内容,并利用HTML::TreeBuilder模块解析HTML文件,提取特定类别的文章信息。示例展示了如何遍历页面元素、构建XPath表达式来定位并提取所需数据。
jrhmpt01:/root/lwp/0526# cat a2.pl 
use  LWP::UserAgent;
use DBI;  
use POSIX;
use Data::Dumper;
use HTML::TreeBuilder;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;
$ua->agent("Mozilla/8.0");


 use HTML::TreeBuilder::XPath;
   $tree= HTML::TreeBuilder::XPath->new;
  $tree->parse_file( "0526.txt");
my    @pages=$tree->find_by_tag_name('li');
                      #@urlall除了包含每个类别的文章,还包含阅读排行里的文章
                      foreach (@pages) {
                                               @titlepage = $_->attr('class');
                                               foreach (@titlepage) {
                                                 if ($_){ 
                                                print "\$_ is $_\n";
                                                unless ($_ ~~ @urlall) { push (@urlall ,$_);};
                                                     };
                                           };
};

print @urlall ;
print "\n";


foreach my $var (@urlall){
#my $url=qq(/html/body//li[@class='$var']);
my $url="/html/body//li\[\@class=xxx\]";
$url =~ s/xxx/"$var"/g;
print "\$url is $url\n";
@total= $tree->findvalues("$url");
print @total;
print "\n";
#my @title= $tree->findvalues('/html/body//li[@class="alcw4 alcw41"]');

};
jrhmpt01:/root/lwp/0526# cat 0526.txt 
  <li class="alcw4 alcw41">
                        <div class="ajjbfb txdbfb bfb100">100<span>%</span></div>
                        <div class="ajjbfb txdbfb bfb100">200<span>%</span></div>
                    </li>


  <li class="alcw4 alcw42">
                        <div class="ajjbfb txdbfb bfb100">100<span>%</span></div>
                        <div class="ajjbfb txdbfb bfb100">200<span>%</span></div>
                        <div class="ajjbfb txdbfb bfb100">scan<span>huihui</span></div>
                    </li>

jrhmpt01:/root/lwp/0526# perl a2.pl 
$_ is alcw4 alcw41
$_ is alcw4 alcw42
alcw4 alcw41alcw4 alcw42
$url is /html/body//li[@class="alcw4 alcw41"]
100%200%
$url is /html/body//li[@class="alcw4 alcw42"]
100%200%scanhuihui

转载于:https://www.cnblogs.com/zhaoyangjian724/p/6199978.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值