python中的正则表达式应用

iteye_19224

于 2009-04-08 10:56:52 发布

阅读量112

点赞数

文章标签：正则表达式 Python C C++ C#

本文介绍了一种使用Python从HTML文件中抓取表格数据的方法，并对获取的数据进行处理与存储。通过正则表达式匹配特定标签，提取所需信息，如名称、分类及URL等，并按类型分类保存。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >


   f=open('wuqu.html','r')
   content=f.read()
   m=re.findall('<tr>[\w|\W]*?</tr>', content)
   print len(m)
   if m is None:return
   for i in range(1,len(m)):
       c=m[i]
       d=re.findall('<a[\w|\W]*?</a>', c)
       if d is None:continue
       if not len(d) == 4:continue
       name= re.search(">(.*?)<",d[0]).group(1) #name
       classify= re.search(">(.*?)<",d[1]).group(1) #classify
       url=re.search('"/(.*?)"',d[3]).group(1) #url
       us=url.split("=")
       type=us[len(us)-1]
       print type
       handle=get_curl()
       rv,mid,type=curl_fetch(handle,url)
       store(mid,name,url,classify,type)
   handle.close()