[Python]检查你的站点的人气[1008Updated]

最新推荐文章于 2025-12-03 12:28:21 发布

原创最新推荐文章于 2025-12-03 12:28:21 发布 · 676 阅读

CC 4.0 BY-SA版权

文章标签：

#python #正则表达式 #url #搜索引擎 #google #import

车东很久以前写过一篇《http://www.chedong.com/tech/link_pop_check.html，如何评价一个网站的人气(Link Popularity Check)》，介绍通过搜索引擎的一些隐含命令评价网站的“人气”。

其实用python做到这一点很容易。

我们换一种他没有提及的方式来实现，那时候应该还没有del.ici.ous这个站点。[2:41 补充]我们还增加了对alltheweb.com这个搜索引擎的反向链接数目判断的支持。

我们提供的 getURLRank Python版本就是这么一种概念：
一个站点的流行程度可以通过很多种方式来判断，比如通过del.ici.ous这个站点，看有多少人收藏了这个页面，据此判断。也可以通过google/alltheweb.com/msn/sogou/baidu的反向链接有多少来判断。

下面是代码，在Python 2.5c下运行成功。

[2:41 补充]上午2:30，增加了对alltheweb.com这个搜索引擎的支持，并根据pandaxiaoxi的建议增加了getattr转发器。

[20:43 补充]增加了对google.com/search.msn.com/sogou.com搜索引擎的支持，并根据limodou的建议增加了字典替换。

[23:00补充]按照limodou的建议，按照urllib.quote_plus()来编码URL。

[16:46补充]增加了对siteexplorer.search.yahoo.com搜索引擎的Inlinks检测支持).

[20061008补充]增加了对baidu收录博客情况的检测支持。

文件名为：getURLRank.py

本程序的运行结果为：
>>python geturlrank.py http://blog.sina.com.cn/m/xujinglei

enter parse_delicious function...
del.ici.ous query url:http://del.icio.us/url/check?url=http%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
get URL : http://del.icio.us/url/check?url=http%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
有多少个人收藏了你的url呢:
148
out parse_delicious function...
enter parse_google function...
google query url:http://www.google.com/search?hl=en&q=link%3Ahttp%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
google 有多少个反向链接呢:
5760
out parse_google function...
enter parse_alltheweb function...
Alltheweb query url:http://www.alltheweb.com/urlinfo?_sb_lang=any&q=http%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
get URL : http://www.alltheweb.com/urlinfo?_sb_lang=any&q=http%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
Alltheweb 有多少个反向链接呢:
1
out parse_alltheweb function...
enter parse_msn function...
msn query url:http://search.msn.com/results.aspx?FORM=QBRE&q=link%3Ahttp%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
msn 有多少个反向链接呢:
217107
out parse_msn function...
enter parse_sogou function...
sogou query url:http://www.sogou.com/web?num=10&query=link%3Ahttp%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
www.sogou.com 评分是多少呢:
66
out parse_sogou function...
enter parse_yahoo function...
yahoo siteexplorer query url:http://siteexplorer.search.yahoo.com/search?bwm=i&bwms=p&bwmf=u&fr=FP-tab-web-t500&fr2=seo-rd-se&p=http%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
siteexplorer.search.yahoo.com Inlinks是多少呢:
228334
out parse_yahoo function...

代码为：

# -*- coding: UTF-8 -*-

# #

# # getURLRank 文档生成日期：2006.09.02

# #

# #(1)概述：

# #模块名称： getURLRank.py

# #模块说明：

# # 解析用户提供的站点地址，看有多少人收藏他,以及有多少个反向链接。

# # 以此来判断一个站点的流行程度。

# #所在子系统: getURLRank

# #

# #系统总描述:

# # 我们提供的 getURLRank Python版本就是这么一种概念：

# # 一个站点的流行程度可以通过很多种方式来判断，比如通过del.ici.ous这个站点，看有多少人收藏了这个页面，据此判断。

# # 也可以通过google.com/alltheweb.com的反向链接有多少来判断。

# #运行方式：

# # python getURLRank.py -- 默认检查我的blog的各种人气指数(delicious/google/alltheweb/msn/sogou/siteexplorer.search.yahoo.com)

# # python getURLRank.py delicious -- 检查我的blog被del.ici.ous收藏程度

# # python getURLRank.py allthweb -- 检查我的blog被allthweb.com搜索到了多少个反向链接

# # python getURLRank.py google -- 检查我的blog被google.com搜索到了多少个反向链接

# # python getURLRank.py msn -- 检查我的blog被search.msn.com搜索到了多少个反向链接

# # python getURLRank.py sogou -- 检查我的blog被www.sogou.com评分SogouRank是多少

# # python getURLRank.py yahoo -- 检查我的blog被siteexplorer.search.yahoo.com收集到的Inlinks是多少

# #

# # python getURLRank.py http://blog.youkuaiyun.com -- 检查csdn blog的各种人气指数(delicious/google/alltheweb/msn/sogou)

# # python getURLRank.py http://blog.youkuaiyun.com alltheweb -- 检查csdn blog被allthweb搜索到了多少个反向链接

# # python getURLRank.py http://blog.youkuaiyun.com google -- 检查csdn blog被google搜索到了多少个反向链接

# # python getURLRank.py http://blog.youkuaiyun.com msn -- 检查csdn blog被search.msn.com搜索到了多少个反向链接

# # python getURLRank.py http://blog.youkuaiyun.com sogou -- 检查csdn blog的SogouRank是多少

# #

# #(2)历史记录:

# #创建人: zhengyun_ustc(2006年9月3日, 上午1:00)

# #修改人: zhengyun_ustc(2006年9月3日, 上午2:30，增加了对alltheweb.com这个搜索引擎的支持，并根据pandaxiaoxi的建议增加了getattr转发器)

# #修改人: zhengyun_ustc(2006年9月3日, 下午4:30，增加了对google.com和search.msn.com搜索引擎的支持)

# #修改人: zhengyun_ustc(2006年9月3日, 下午5:10，增加了对www.sogou.com搜索引擎的SogouRank检测支持)

# #修改人: zhengyun_ustc(2006年9月5日, 下午2:10，增加了对siteexplorer.search.yahoo.com搜索引擎的Inlinks检测支持)

# #联系我: Google Talk >> zhengyun(at)gmail.com

# #Blogs: http://blog.youkuaiyun.com/zhengyun_ustc/

# #zhengyun_ustc这个python版本的 getURLRank ，代码您可以借鉴，但不得用于商业用途，除非得到zhengyun_ustc的授权。

from sgmllib import SGMLParser

import os,sys,re

import socket

import httplib

import urllib

import urllib2

from xml.dom import minidom

name = ' zhengyun '

google_email = ' your@gmail.com '

google_password = ' pass '

# 可以将要替换的东西写成一个list或字典来对应，然后通过循环进行替换

# aReplaceURL = [('://','%3A%2F%2F'), ('/','%2F')]

# google声明：“未经 Google 事先明确许可，不得将任何形式的自动查询发到 Google 系统。请注意，“自动查询”包括通过使用软件向 Google 发送查询来确定搜索不同内容时网站的 Google 排名”

# 所以如果要用Google的某些服务，比如日历，我们只能事先登录。

# 但是对于查询，是不用登录的。

def initGoogleLogin(email, passwd):

params = urllib.urlencode({ ' Email ' : email,

' Passwd ' : passwd,

' service ' : ' cl ' ,

' source ' : ' upbylunch-googie-0 ' })

headers = { " Content-type " : " application/x-www-form-urlencoded " }

print ' enter initGoogleLogin function... '

conn = httplib.HTTPSConnection( " www.google.com " )

conn.request( " POST " , " /accounts/ClientLogin " , params, headers)

response = conn.getresponse()

data = response.read()

# If the login fails Google returns 403

if response.status == 403 :

google_auth = None

else :

google_auth = data.splitlines()[ 2 ].split( ' = ' )[ 1 ]

print ' google_auth= ' + google_auth

conn.close

return google_auth

class GoogleClRedirectHandler(urllib2.HTTPRedirectHandler):

def http_error_301(self, req, fp, code, msg, hdrs):

result = urllib2.HTTPRedirectHandler.http_error_301(self, req, fp, code, msg, hdrs)

result.status = code

return result

def http_error_201(self, req, fp, code, msg, hdrs):

return ' Success '

def http_error_302(self, req, fp, code, msg, hdrs):

result = urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, hdrs)

result.status = code

return hdrs.dict[ ' location ' ]

# 获取web页面内容并返回

def getURLContent(url):

print " get URL : %s " % url

f = urllib.urlopen(url)

data = f.read()

f.close()

return data

# 检查del.ici.ous站点对你的收藏情况的函数

def parse_delicious(url):

print ' enter parse_delicious function... '

# 这是delicious网站检查连接是否被收藏的链接前缀

delicousPrefixURL = ' http://del.icio.us/url/check?url= '

# for i, j in aReplaceURL:

# postData = url.replace(i, j)

postData = delicousPrefixURL + urllib.quote_plus(url)

print ' del.ici.ous query url: ' + postData

# 下面这个data元素是delicious网站检查结果的HTML

data = getURLContent(postData)

# print data

# 下面我们要从中通过正则表达式

# urls*hass*beens*saveds*by(s*?[^p]*s*)people

# 来解析出到底有多少人收藏了：

pParsere = re.compile( ' url/s*has/s*been/s*saved/s*by(/s*?[^p]*/s*)people ' )

# print '输出正则表达式对象:'

# print pParsere

# 正式开始解析,此处使用pParsere.match(data)是不行的；只能使用pParsere.search(data)

matchDelicious = pParsere.search(data)

if (matchDelicious):

# print '有多少个人收藏?是否解析成功呢？:'

# print matchDelicious

print ' 有多少个人收藏了你的url呢: '

# 如果这里输出matchDelicious.group(0)，那么将会是整个输出，如“url has been saved by 2 people”

# 而如果是group(1)，则是正确的数字：2

print matchDelicious.group( 1 ).replace( ' , ' , '' )

else :

# 那我们只有通过正则表达式

# Theres*iss*nos*del.icio.uss*historys*fors*thiss*url

# 来解析出是不是没有人收藏了：

pParsere = re.compile( ' There/s*is/s*no/s*del.icio.us/s*history/s*for/s*this/s*url ' )

# print '输出正则表达式对象:'

# print pParsere

# 正式开始解析,此处使用pParsere.match(data)是不行的；只能使用pParsere.search(data)

matchDelicious = pParsere.search(data)

if (matchDelicious):

print ' 是不是没有人收藏?是否解析成功呢？: '

print matchDelicious

print ' 没有人收藏这个url! '

print ' out parse_delicious function... '

# 检查Alltheweb站点对你的反向链接情况的函数

def parse_alltheweb(url):

print ' enter parse_alltheweb function... '

# 这是Alltheweb网站检查是否有反向链接的前缀

# _sb_lang=any这个参数极其的重要！

# 如果没有这个参数，那么alltheweb仅仅在english里搜索，那么势必会造成失真，比如我的blog只有12个结果；

# 只有发起请求的时候就列出“_sb_lang=any&”参数，才可以在全部语言中搜索，这样我的blog就有212个结果了。

AllthewebPrefixURL = ' http://www.alltheweb.com/urlinfo?_sb_lang=any&q= '

# for i, j in aReplaceURL:

# postData = url.replace(i, j)

postData = AllthewebPrefixURL + urllib.quote_plus(url)

print ' Alltheweb query url: ' + postData

# 下面这个data元素是Alltheweb网站检查结果的HTML

data = getURLContent(postData)

# print data

# 下面我们要从中通过正则表达式

# (? [^<]*)

# 来解析出到底有多少个反向链接：

pParsere = re.compile(r ' (?P [^<]+) ' )

# print '输出正则表达式对象:'

# print pParsere

# 正式开始解析,此处使用pParsere.match(data)是不行的；只能使用pParsere.search(data)

matchAlltheweb = pParsere.search(data)

if (matchAlltheweb):

# print '有多少个反向链接?是否解析成功呢？:'

# print matchAlltheweb

print ' Alltheweb 有多少个反向链接呢: '

# 而如果是group(1)，则是正确的数字：212

print matchAlltheweb.group( 1 ).replace( ' , ' , '' )

else :

# 那我们只有通过正则表达式

# Nos*Webs*pagess*founds*thats*matchs*yours*query

# 来解析出是不是没有反向链接：

pParsere = re.compile( ' No/s*Web/s*pages/s*found/s*that/s*match/s*your/s*query ' )

# print '输出正则表达式对象:'

# print pParsere

# 正式开始解析,此处使用pParsere.match(data)是不行的；只能使用pParsere.search(data)

matchAlltheweb = pParsere.search(data)

if (matchAlltheweb):

print ' 是不是没有反向链接?: '

print matchAlltheweb

print ' 这个url没有反向链接! '

print ' out parse_alltheweb function... '

# 解析google网页内容，得到有多少个连接

def parseGoogleResult(page):

    m = re.search( ' (?<= of about )([0-9]|,)+ ',page)
    return m.group(0).replace(',','')

#检查 google 站点对你的反向链接情况的函数
def parse_google(url):
    print 'enter parse_google function...'

    # 首先模拟google登录，获取验证信息:
    #google_auth = initGoogleLogin(google_email, google_password)

    # 这是 google 网站检查是否有反向链接的前缀
    GooglePrefixURL = 'http://www.google.com/search?hl=en&q=link%3A'
    #for i, j in aReplaceURL:
    #    postURL = url.replace(i, j)
    postURL = GooglePrefixURL + urllib.quote_plus(url)
    print 'google query url:' + postURL

    # 由于google某些服务对软件发起的查询是禁止的；
    # 所以我们事先用自己的google账号登录
    #headers = {"Authorization": "GoogleLogin auth=%s" % google_auth,
    #               "Content-type": "text/html; charset=UTF-8"}
    #print headers

    # 下面这个data元素是 google 网站检查结果的HTML
    request = urllib2.Request(postURL)
    request.add_header('User-Agent','Mozilla/5.0 (X11; U; Linux i686; pt-BR; rv:1.8) Gecko/20051111 Firefox/1.5')
    opener = urllib2.build_opener(GoogleClRedirectHandler)
    data = opener.open(request).read()
    #print data

    # 下面我们要从中通过正则表达式
    #    (?<= of about )([0-9]|,)+
    # 来解析出到底有多少个反向链接：
    matchGoogleLinks = re.search('(?<= of about )([0-9]|,)+',data)
    if(matchGoogleLinks):
        print 'google 有多少个反向链接呢:'
        print matchGoogleLinks.group(0).replace(',','')
    else:
        # 那我们只有通过正则表达式
        #    dids*nots*matchs*anys*documents.
        # 来解析出是不是没有反向链接：
        pParsere = re.compile('did/s*not/s*match/s*any/s*documents.')
        #print '输出正则表达式对象:'
        #print pParsere
        # 正式开始解析,此处使用pParsere.match(data)是不行的；只能使用pParsere.search(data)
        matchGoogleLinks = pParsere.search(data)
        if(matchGoogleLinks):
            print '是不是没有google反向链接?:'
            #print matchGoogleLinks
            print 'url确实没有google反向链接!'

    print 'out parse_google function...'

#检查 msn 站点对你的反向链接情况的函数
def parse_msn(url):
    print 'enter parse_msn function...'

    # 这是 msn 网站检查是否有反向链接的前缀
    PrefixURL = 'http://search.msn.com/results.aspx?FORM=QBRE&q=link%3A'
    #for i, j in aReplaceURL:
    #    postURL = url.replace(i, j)
    postURL = PrefixURL + urllib.quote_plus(url)
    print 'msn query url:' + postURL

    # 下面这个data元素是 msn 网站检查结果的HTML
    request = urllib2.Request(postURL)
    request.add_header('User-Agent','Mozilla/5.0 (X11; U; Linux i686; pt-BR; rv:1.8) Gecko/20051111 Firefox/1.5')
    opener = urllib2.build_opener()
    data = opener.open(request).read()
    #print data

    # 下面我们要从中通过正则表达式
    #    Pages*1s*of(s*?[^p]*s*)resultss*containing
    # 来解析出到底有多少个反向链接：
    matchLinks = re.search('Page/s*1/s*of(/s*?[^p]*/s*)results/s*containing',data)
    if(matchLinks):
        print 'msn 有多少个反向链接呢:'
        # 注意：matchLinks.group(0)将会打印出整整一句话“Page 1 of 326 results containing”
        print matchLinks.group(1).replace(',','')
    else:
        # 那我们只有通过正则表达式
        #    Wes*couldn'ts*finds*anys*resultss*containing
        # 来解析出是不是没有反向链接：
        pParsere = re.compile('We/s*couldn't/s*find/s*any/s*results/s*containing')
        #print '输出正则表达式对象:'
        #print pParsere
        # 正式开始解析,此处使用pParsere.match(data)是不行的；只能使用pParsere.search(data)
        matchLinks = pParsere.search(data)
        if(matchLinks):
            print '是不是没有msn反向链接?:'
            print 'url确实没有msn反向链接!'

    print 'out parse_msn function...'

#检查 sogou 站点对你的sogouRank的函数
def parse_sogou(url):
    print 'enter parse_sogou function...'

    # 这是 www.sogou.com 网站检查是否有反向链接的前缀
    PrefixURL = 'http://www.sogou.com/web?num=10&query=link%3A'
    #for i, j in aReplaceURL:
    #    postURL = url.replace(i, j)
    postURL = PrefixURL + urllib.quote_plus(url)
    print 'sogou query url:' + postURL

    # 下面这个data元素是 sogou 网站检查结果的HTML
    request = urllib2.Request(postURL)
    request.add_header('User-Agent','Mozilla/5.0 (X11; U; Linux i686; pt-BR; rv:1.8) Gecko/20051111 Firefox/1.5')
    opener = urllib2.build_opener()
    data = opener.open(request).read()
    #print data

    # 下面我们要从中通过正则表达式
    #     ]*width="([^%]*)
    # 来解析出sogou给这个网页打的分数是多少:
    matchLinks = re.search(']*width="([^/%]*)',data)
    if(matchLinks):
        print 'www.sogou.com 评分是多少呢:'
        # 注意：matchLinks.group(0)将会打印出整整一句话
        print matchLinks.group(1)
    else:
        # 那我们只有通过正则表达式
        #     抱歉，没有找到指向网页
        # 来解析出是不是没有反向链接：
        pParsere = re.compile('
' )
         # print '输出正则表达式对象:'
         # print pParsere
         # 正式开始解析,此处使用pParsere.match(data)是不行的；只能使用pParsere.search(data)
        matchLinks = pParsere.search(data)
         if (matchLinks):
             print ' 是不是没有sogou反向链接?: '
             print ' url确实没有sogouRank! '

     print ' out parse_sogou function... '

# 检查 yahoo! 站点对你的site explorer的Inlinks数字
def parse_yahoo(url):
     print ' enter parse_yahoo function... '

     # 这是 siteexplorer.search.yahoo.com 网站检查是否有反向链接的前缀
    PrefixURL = ' http://siteexplorer.search.yahoo.com/search?bwm=i&bwms=p&bwmf=u&fr=FP-tab-web-t500&fr2=seo-rd-se&p= '
     # for i, j in aReplaceURL:
     #     postURL = url.replace(i, j)
    postURL = PrefixURL + urllib.quote_plus(url)
     print ' yahoo siteexplorer query url: ' + postURL

     # 下面这个data元素是 yahoo! 网站检查结果的HTML
    request = urllib2.Request(postURL)
    request.add_header( ' User-Agent ' , ' Mozilla/5.0 (X11; U; Linux i686; pt-BR; rv:1.8) Gecko/20051111 Firefox/1.5 ' )
    opener = urllib2.build_opener()
    data = opener.open(request).read()
     # print data

     # 下面我们要从中通过正则表达式
     #     ([0-9]*)(? [^>]*)s*ofs*abouts* (s*?[^<]*)
     # 来解析出sogou给这个网页打的分数是多少:
    matchLinks = re.search( ' /s*of/s*about/s*(/s*?[^<]*) ' ,data)
     if (matchLinks):
         print ' siteexplorer.search.yahoo.com Inlinks是多少呢: '
         # 注意：matchLinks.group(0)将会打印出整整一句话
         print matchLinks.group( 1 ).replace( ' , ' , '' )
     else :
         # 那我们只有通过正则表达式
         #     Wes*weres*unables*tos*finds*anys*resultss*fors*thes*givens*URLs*ins*ours*index
         # 来解析出是不是没有反向链接：
        pParsere = re.compile(r ' Wes*weres*unables*tos*finds*anys*resultss*fors*thes*givens*URLs*ins*ours*index ' )
         # print '输出正则表达式对象:'
         # print pParsere
         # 正式开始解析,此处使用pParsere.match(data)是不行的；只能使用pParsere.search(data)
        matchLinks = pParsere.search(data)
         if (matchLinks):
             print ' 是不是没有siteexplorer.search.yahoo.com Inlinks?: '
             print ' url确实没有siteexplorer.search.yahoo.com Inlinks! '

     print ' out parse_yahoo function... '

#
# 把给定的站点的连接发送给每一个老大哥，比如说del.ici.ous，Alltheweb, google, msn, sogou等
# 从返回的页面中找到所需要的数值。
def postURL2BigBrother(url = " http://blog.youkuaiyun.com/zhengyun_ustc " ,bigbrother = "" ):
#     sys.setdefaultencoding('utf-8')

     if (len(bigbrother) > 0):
        method_name = " parse_%s " % bigbrother
        RankMethod = __import__ ( " getURLRank " )
         # 使用 getattr 函数，可以得到一个直到运行时才知道名称的函数的引用
        method = getattr(RankMethod,method_name)
        method(url)
     else :
        parse_delicious(url)
        parse_google(url)
        parse_alltheweb(url)
        parse_msn(url)
        parse_sogou(url)
        parse_yahoo(url)

# 应用入口
if __name__ == ' __main__ ' :
    argc = len(sys.argv)
     if (argc == 3 ):
         print sys.argv[ 1 ]
         print sys.argv[ 2 ]
         # 把postURL2BigBrother函数作为一个分发器，加一个参数，BigBrother='delicious'或者BigBrother='alltheweb'来指定使用谁解析
         # http://www.woodpecker.org.cn/diveintopython/power_of_introspection/getattr.html
        postURL2BigBrother(sys.argv[ 1 ],sys.argv[ 2 ])
     elif (argc == 2 ):
        postURL2BigBrother( " http://blog.youkuaiyun.com/zhengyun_ustc " , sys.argv[ 1 ])
     else :
        postURL2BigBrother()

[20061008补充代码部分为]

#检查 baidu 站点对你的博客收录情况的函数
def parse_baidu(url):
    print 'enter parse_baidu function...'

    # 这是 baidu 网站检查是否有反向链接的前缀
    PrefixURL = 'http://www.baidu.com/s?lm=0&si=&rn=10&ie=gb2312&ct=0&wd=domain%3A'
    #for i, j in aReplaceURL:
    #    postURL = url.replace(i, j)
    postURL = PrefixURL + urllib.quote_plus(url)
    print 'baidu query url:' + postURL

    # 下面这个data元素是 baidu 网站检查结果的HTML
    request = urllib2.Request(postURL)
    request.add_header('User-Agent','Mozilla/5.0 (X11; U; Linux i686; pt-BR; rv:1.8) Gecko/20051111 Firefox/1.5')
    opener = urllib2.build_opener(GoogleClRedirectHandler)
    data = opener.open(request).read()
    #print data

    # 下面我们要从中通过正则表达式
    matchLinks = re.search(u'(?<=找到相关网页约)([0-9]|,)+', unicode(data,'cp936'))
    if(matchLinks is None):
        matchLinks = re.search(u'(?<=找到相关网页)([0-9]|,)+', unicode(data,'cp936'))
    print matchLinks
    if(matchLinks):
        print 'baidu 有多少个反向链接呢:'
        print matchLinks.group().replace(',','')
    else:
        # 那我们只有通过正则表达式
        #    抱歉，没有找到与s*[^”]*
        # 来解析出是不是没有反向链接：
        pParsere = re.compile(u'抱歉，没有找到与/s*[^”]*')
        # 正式开始解析,此处使用pParsere.match(data)是不行的；只能使用pParsere.search(data)
        matchLinks = pParsere.search(unicode(data,'cp936'))
        if(matchLinks):
            print '是不是没有 baidu 收录呢?:'
            #print matchLinks
            print 'url确实没有 baidu 收录 !'

    print 'out parse_baidu function...'