最近在测试一份代码,需求是从去哪儿网通过Python网络爬虫获取一些城市的信息,
代码如下:
from bs4 import BeautifulSoup
import pandas as pd
import requests
def get_static_url_content(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
req=requests.get(url,headers=headers)
content=req.text
bsObj=BeautifulSoup(content,'lxml')
return bsObj
def get_city_id():
url = 'http://travel.qunar.com/place/'
bsObj=get_static_url_content(url)
cat_url = []
cat_name = []
bs=bsObj.find_all('div',attrs={'class':'sub_list'})
for i in range(0,len(bs)):
xxx = bs[i].find_all('a')
for j in range(0,len(xxx)):
cat_name.append(xxx[j].text)
cat_url.append(xxx[j].attrs['href'])
return cat_name,cat_url
city_name_list,city_url_list=get_city_id()
city=pd.DataFrame({'city_name':city_name_list,'city_code':city_url_list})
city.to_csv('./输出文件/city.csv',encoding='utf_8_sig')
在pycharm中使用Python3环境运行,发现报错:
"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" E:/test8/去哪网城市.py
Traceback (most recent call last):
File "E:/test8/去哪网城市.py", line 1, in <module>
from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'
Process finished with exit code 1
原来是在当前的Python环境中没有bs4第三方库文件,那就安装呗。通过pip进行安装,在cmd中输入下述语句进行安装,但是还是报错,报解决访问的错误,如下图:
Microsoft Windows [版本 10.0.17134.1006]
(c) 2018 Microsoft Corporation。保留所有权利。
C:\Users\admin>cd C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts>pip install bs4
Collecting bs4
Collecting beautifulsoup4 (from bs4)
Using cached https://files.pythonhosted.org/packages/1a/b7/34eec2fe5a49718944e215fde81288eec1fa04638aa3fb57c1c6cd0f98c3/beautifulsoup4-4.8.0-py3-none-any.whl
Collecting soupsieve>=1.2 (from beautifulsoup4->bs4)
Using cached https://files.pythonhosted.org/packages/0b/44/0474f2207fdd601bb25787671c81076333d2c80e6f97e92790f8887cf682/soupsieve-1.9.3-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4, bs4
Could not install packages due to an EnvironmentError: [WinError 5] 拒绝访问。: 'C:\\Program Files\\ArcGIS\\Pro\\bin\\Python\\envs\\arcgispro-py3\\Lib\\site-packages\\soupsieve'
Consider using the `--user` option or check the permissions.
提示:C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts是我从pip.exe存在的路径。
通过搜索发现可以使用user选项进行安装,那我试试。
pip install bs4 --user
这样还真是能够安装成功,如下:
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts>pip install bs4 --user
Collecting bs4
Collecting beautifulsoup4 (from bs4)
Using cached https://files.pythonhosted.org/packages/1a/b7/34eec2fe5a49718944e215fde81288eec1fa04638aa3fb57c1c6cd0f98c3/beautifulsoup4-4.8.0-py3-none-any.whl
Collecting soupsieve>=1.2 (from beautifulsoup4->bs4)
Using cached https://files.pythonhosted.org/packages/0b/44/0474f2207fdd601bb25787671c81076333d2c80e6f97e92790f8887cf682/soupsieve-1.9.3-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4, bs4
Successfully installed beautifulsoup4-4.8.0 bs4-0.0.1 soupsieve-1.9.3
同时在我的本机“C:\Users\admin\AppData\Roaming\Python\Python36\site-packages”路径下能看到安装的包。
同时在pycharm的Python3环境下能够看到bs4,
既然bs4包安装了,那就在pycharm中运行脚本吧,但是运行之后发现还是报错,这时报另外一个错误,说“bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml.”
"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" E:/test8/去哪网城市.py
Traceback (most recent call last):
File "E:/test8/去哪网城市.py", line 26, in <module>
city_name_list,city_url_list=get_city_id()
File "E:/test8/去哪网城市.py", line 14, in get_city_id
bsObj=get_static_url_content(url)
File "E:/test8/去哪网城市.py", line 9, in get_static_url_content
bsObj=BeautifulSoup(content,'lxml')
File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\bs4\__init__.py", line 208, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
Process finished with exit code 1
继续百度发现,需要安装一个lxml的包;同理,在cmd中输入下述语句:
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts>pip install lxml --user
Collecting lxml
Using cached https://files.pythonhosted.org/packages/6f/6d/d54317403070fcaae973f38b9c298e4b4c101b469ae51afa7c1370e5c35b/lxml-4.4.1-cp36-cp36m-win_amd64.whl
Installing collected packages: lxml
Successfully installed lxml-4.4.1
安装完lxml包之后,在pycharm中运行脚本,这时终于没有报错,运行成功,同时爬取到的结果输出到指定的文件夹下。
参考资料:
https://www.cnblogs.com/sddai/p/10209931.html
https://blog.youkuaiyun.com/qq_34215281/article/details/77714584
https://blog.youkuaiyun.com/Noob_coder_JZ/article/details/82821042