如何简单的爬取网络数据

最新推荐文章于 2025-03-31 10:56:03 发布

野生嵌入式

最新推荐文章于 2025-03-31 10:56:03 发布

阅读量3.2k

点赞数 2

分类专栏： python 文章标签： python 爬虫

本文链接：https://blog.youkuaiyun.com/qq_40809494/article/details/115305472

版权

python 专栏收录该内容

1 篇文章

订阅专栏

如何简单的爬取网络数据

首先放出完整代码，不多，就几行。

import requests
import re
import time
t=100
while True:
   response=requests.get('http://www.dyhjw.com/guojijin.html')       #先确定要爬取的网页
   html=response.text                                                #将网站链接赋值给html
   m=re.findall(' <td class="cor"><span class="data_1 last">(.*?)</span></td>',html)   #（。*？）为需要爬取的数据    
   t=t+1                                                             #变量，用来计数
   if t>999:                                                         
      t=100
   print(t,"黄金实时交易价格:",m[4],"元/克")                           #输出爬取到的数据
   time.sleep(1)                                                     #睡眠一秒

我们分开讲一下。
先引用三个库

import requests   
import re
import time

这里确定你要爬取的网站

 response=requests.get('http://www.dyhjw.com/guojijin.html')       #先确定要爬取的网页
   html=response.text                                                #将网站链接赋值给

这一步是要在网页上确定是要爬取数据的位置

m=re.findall(' <td class="cor"><span class="data_1 last">(.*?)</span></td>',html)   #（。*？）为需要爬取的数据

如何确定位置呢？我们以下边的网站为例，比如我们要爬取黄金的实时数据
第一步：我们先选中黄金的价格
第二步：右击，点击检查，然后会弹出网页代码
第三步：然后复制这一行，把黄金价格用（.*?）代替。
在这里插入图片描述
将获取到的价格输出就可以了。

print(t,"黄金实时交易价格:",m[4],"元/克")                           #输出爬取到的数据

最后我们看一下效果
在这里插入图片描述