如何使用Python网络爬虫

本文详细记录了解决在Python环境中安装bs4和lxml库遇到的权限错误问题,通过使用--user选项成功安装,并解决了bs4.FeatureNotFound错误。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 最近在测试一份代码,需求是从去哪儿网通过Python网络爬虫获取一些城市的信息,

代码如下:

from bs4 import BeautifulSoup
import pandas as pd
import requests

def get_static_url_content(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
    req=requests.get(url,headers=headers)
    content=req.text
    bsObj=BeautifulSoup(content,'lxml')
    return bsObj

def get_city_id():
    url = 'http://travel.qunar.com/place/'
    bsObj=get_static_url_content(url)
    cat_url = []
    cat_name = []
    bs=bsObj.find_all('div',attrs={'class':'sub_list'})

    for i in range(0,len(bs)):
        xxx = bs[i].find_all('a')
        for j in range(0,len(xxx)):
            cat_name.append(xxx[j].text)
            cat_url.append(xxx[j].attrs['href'])
    return cat_name,cat_url

city_name_list,city_url_list=get_city_id()
city=pd.DataFrame({'city_name':city_name_list,'city_code':city_url_list})
city.to_csv('./输出文件/city.csv',encoding='utf_8_sig')

在pycharm中使用Python3环境运行,发现报错:

"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" E:/test8/去哪网城市.py
Traceback (most recent call last):
  File "E:/test8/去哪网城市.py", line 1, in <module>
    from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'

Process finished with exit code 1

原来是在当前的Python环境中没有bs4第三方库文件,那就安装呗。通过pip进行安装,在cmd中输入下述语句进行安装,但是还是报错,报解决访问的错误,如下图:

Microsoft Windows [版本 10.0.17134.1006]
(c) 2018 Microsoft Corporation。保留所有权利。

C:\Users\admin>cd C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts

C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts>pip install bs4
Collecting bs4
Collecting beautifulsoup4 (from bs4)
  Using cached https://files.pythonhosted.org/packages/1a/b7/34eec2fe5a49718944e215fde81288eec1fa04638aa3fb57c1c6cd0f98c3/beautifulsoup4-4.8.0-py3-none-any.whl
Collecting soupsieve>=1.2 (from beautifulsoup4->bs4)
  Using cached https://files.pythonhosted.org/packages/0b/44/0474f2207fdd601bb25787671c81076333d2c80e6f97e92790f8887cf682/soupsieve-1.9.3-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4, bs4
Could not install packages due to an EnvironmentError: [WinError 5] 拒绝访问。: 'C:\\Program Files\\ArcGIS\\Pro\\bin\\Python\\envs\\arcgispro-py3\\Lib\\site-packages\\soupsieve'
Consider using the `--user` option or check the permissions.

提示:C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts是我从pip.exe存在的路径。

通过搜索发现可以使用user选项进行安装,那我试试。

pip install bs4 --user

这样还真是能够安装成功,如下: 

C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts>pip install bs4 --user
Collecting bs4
Collecting beautifulsoup4 (from bs4)
  Using cached https://files.pythonhosted.org/packages/1a/b7/34eec2fe5a49718944e215fde81288eec1fa04638aa3fb57c1c6cd0f98c3/beautifulsoup4-4.8.0-py3-none-any.whl
Collecting soupsieve>=1.2 (from beautifulsoup4->bs4)
  Using cached https://files.pythonhosted.org/packages/0b/44/0474f2207fdd601bb25787671c81076333d2c80e6f97e92790f8887cf682/soupsieve-1.9.3-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4, bs4
Successfully installed beautifulsoup4-4.8.0 bs4-0.0.1 soupsieve-1.9.3

同时在我的本机“C:\Users\admin\AppData\Roaming\Python\Python36\site-packages”路径下能看到安装的包。

同时在pycharm的Python3环境下能够看到bs4,

既然bs4包安装了,那就在pycharm中运行脚本吧,但是运行之后发现还是报错,这时报另外一个错误,说“bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml.”

"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" E:/test8/去哪网城市.py
Traceback (most recent call last):
  File "E:/test8/去哪网城市.py", line 26, in <module>
    city_name_list,city_url_list=get_city_id()
  File "E:/test8/去哪网城市.py", line 14, in get_city_id
    bsObj=get_static_url_content(url)
  File "E:/test8/去哪网城市.py", line 9, in get_static_url_content
    bsObj=BeautifulSoup(content,'lxml')
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\bs4\__init__.py", line 208, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

Process finished with exit code 1

继续百度发现,需要安装一个lxml的包;同理,在cmd中输入下述语句:

C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts>pip install lxml --user
Collecting lxml
  Using cached https://files.pythonhosted.org/packages/6f/6d/d54317403070fcaae973f38b9c298e4b4c101b469ae51afa7c1370e5c35b/lxml-4.4.1-cp36-cp36m-win_amd64.whl
Installing collected packages: lxml
Successfully installed lxml-4.4.1

安装完lxml包之后,在pycharm中运行脚本,这时终于没有报错,运行成功,同时爬取到的结果输出到指定的文件夹下。

 

参考资料: 

https://www.cnblogs.com/sddai/p/10209931.html

https://blog.youkuaiyun.com/qq_34215281/article/details/77714584

https://blog.youkuaiyun.com/Noob_coder_JZ/article/details/82821042

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值