Numpy_pandas&wordcloud

一、用xlrd读excel文件

In [1]:

import xlrd

In [2]:

pip show xlrd
Name: xlrdNote: you may need to restart the kernel to use updated packages.

Version: 2.0.1
Summary: Library for developers to extract data from Microsoft Excel (tm) .xls spreadsheet files
Home-page: http://www.python-excel.org/
Author: Chris Withers
Author-email: chris@withers.org
License: BSD
Location: c:\users\tx\anaconda3\lib\site-packages
Requires: 
Required-by: 

In [8]:

wb = xlrd.open_workbook(r'..\Stu_pack13\Stu_pack13\pandas\school.xls')

sheet = wb.sheet_by_index(0)

In [15]:

wb = xlrd.open_workbook(r'..\Stu_pack13\Stu_pack13\pandas\school.xls')

sheet = wb.sheet_by_index(0)

schools = []           #定义一个空的二维数据列表,用来存放多行数据
for i in range(20):
    school = []       #定义一个空的一维数组列表,用来存放每行多列数据位
    for j in range(sheet.ncols):
        content = sheet.cell_value(i,j)          #将每一列的数据值读出放在变量content里面
        school.append(content)                  #将读出的数据值追加到一维列表shool里面
    schools.append(school)                   #将一维列表里面所有的数据值一行一行的追加到二维数据列表里面
#读出相应的数据:
for school in schools:
    print(school)
['招生单位代码', '招生单位名称', '所在省份', '是否985', '是否211', '是否自主划线', '学校类型']
['10001', '北京大学', '北京市', '是', '是', '是', '综合类']
['10002', '中国人民大学', '北京市', '是', '是', '是', '综合类']
['10003', '清华大学', '北京市', '是', '是', '是', '理工类']
['10004', '北京交通大学', '北京市', '否', '是', '否', '理工类']
['10005', '北京工业大学', '北京市', '否', '是', '否', '理工类']
['10006', '北京航空航天大学', '北京市', '是', '是', '是', '理工类']
['10007', '北京理工大学', '北京市', '是', '是', '是', '理工类']
['10008', '北京科技大学', '北京市', '否', '是', '否', '理工类']
['10009', '北方工业大学', '北京市', '否', '否', '否', '理工类']
['10010', '北京化工大学', '北京市', '否', '是', '否', '理工类']
['10011', '北京工商大学', '北京市', '否', '否', '否', '']
['10012', '北京服装学院', '北京市', '否', '否', '否', '理工类']
['10013', '北京邮电大学', '北京市', '否', '是', '否', '理工类']
['10015', '北京印刷学院', '北京市', '否', '否', '否', '理工类']
['10016', '北京建筑大学', '北京市', '否', '否', '否', '理工类']
['10017', '北京石油化工学院', '北京市', '否', '否', '否', '']
['10018', '北京电子科技学院', '北京市', '否', '否', '否', '']
['10019', '中国农业大学', '北京市', '是', '是', '是', '农林类']
['10020', '北京农学院', '北京市', '否', '否', '否', '农林类']

二、用pandas读取excel文件

(1)Pandas是一种基于NumPy的开源的数据分析工具包,提供了高性能、简单易用的数据结构和数据分析函数。
(2)NumPy是用于存放同类型元素的多维数据,ndarray中的每个元素在内存中占有相同大小的区域。

In [16]:

pip show pandas
Name: pandasNote: you may need to restart the kernel to use updated packages.

Version: 1.2.4
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: c:\users\tx\anaconda3\lib\site-packages
Requires: numpy, pytz, python-dateutil
Required-by: statsmodels, seaborn

In [17]:

pip show numpy
Name: numpy
Version: 1.20.1
Summary: NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email: None
License: BSD
Location: c:\users\tx\anaconda3\lib\site-packages
Requires: 
Required-by: tifffile, tables, statsmodels, seaborn, scipy, scikit-learn, scikit-image, PyWavelets, pyerfa, patsy, pandas, numexpr, numba, mkl-random, mkl-fft, matplotlib, imageio, imagecodecs, h5py, Bottleneck, bokeh, bkcharts, astropy
Note: you may need to restart the kernel to use updated packages.

In [18]:

pip install pandas
Requirement already satisfied: pandas in c:\users\tx\anaconda3\lib\site-packages (1.2.4)
Requirement already satisfied: numpy>=1.16.5 in c:\users\tx\anaconda3\lib\site-packages (from pandas) (1.20.1)
Requirement already satisfied: pytz>=2017.3 in c:\users\tx\anaconda3\lib\site-packages (from pandas) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\tx\anaconda3\lib\site-packages (from pandas) (2.8.1)
Requirement already satisfied: six>=1.5 in c:\users\tx\anaconda3\lib\site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
Note: you may need to restart the kernel to use updated packages.

In [19]:

pip install numpy
Requirement already satisfied: numpy in c:\users\tx\anaconda3\lib\site-packages (1.20.1)Note: you may need to restart the kernel to use updated packages.

In [20]:

import pandas as pd

In [24]:

data = pd.read_excel(r'..\Stu_pack13\Stu_pack13\pandas\school.xls')

In [26]:

data.head(20)

Out[26]:

招生单位代码招生单位名称所在省份是否985是否211是否自主划线学校类型
010001北京大学北京市综合类
110002中国人民大学北京市综合类
210003清华大学北京市理工类
310004北京交通大学北京市理工类
410005北京工业大学北京市理工类
510006北京航空航天大学北京市理工类
610007北京理工大学北京市理工类
710008北京科技大学北京市理工类
810009北方工业大学北京市理工类
910010北京化工大学北京市理工类
1010011北京工商大学北京市NaN
1110012北京服装学院北京市理工类
1210013北京邮电大学北京市理工类
1310015北京印刷学院北京市理工类
1410016北京建筑大学北京市理工类
1510017北京石油化工学院北京市NaN
1610018北京电子科技学院北京市NaN
1710019中国农业大学北京市农林类
1810020北京农学院北京市农林类
1910022北京林业大学北京市农林类

In [28]:

data[data.所在省份=='上海市']

Out[28]:

招生单位代码招生单位名称所在省份是否985是否211是否自主划线学校类型
16710246复旦大学上海市综合类
16810247同济大学上海市理工类
16910248上海交通大学上海市综合类
17010251华东理工大学上海市理工类
17110252上海理工大学上海市理工类
17210254上海海事大学上海市理工类
17310255东华大学上海市理工类
17410256上海电力学院上海市理工类
17510259上海应用技术大学上海市理工类
17610264上海海洋大学上海市农林类
17710268上海中医药大学上海市医药类
17810269华东师范大学上海市师范类
17910270上海师范大学上海市师范类
18010271上海外国语大学上海市语言类
18110272上海财经大学上海市财经类
18210273上海对外经贸大学上海市NaN
18310274上海海关学院上海市NaN
18410276华东政法大学上海市政法类
18510277上海体育学院上海市体育类
18610278上海音乐学院上海市艺术类
18710279上海戏剧学院上海市艺术类
18810280上海大学上海市NaN
48710856上海工程技术大学上海市理工类
49711047上海立信会计金融学院上海市NaN
53311458上海电机学院上海市NaN
55911835上海政法学院上海市NaN
56612044上海第二工业大学上海市NaN
58280402上海国家会计学院上海市NaN
61482707上海材料研究所上海市NaN
61882717上海发电设备成套设计研究院上海市理工类
61982718上海内燃机研究所上海市NaN
62382805上海核工程研究设计院上海市NaN
65082937中国航空研究院640所上海市NaN
66183009华东计算技术研究所上海市NaN
69383285上海航天技术研究院(航天八院)上海市NaN
69883502上海化工研究院上海市NaN
70783901上海船舶运输科学研究所上海市NaN
71084002电信科学技术第一研究所(上海)上海市NaN
72084505上海生物制品研究所上海市NaN
74185901中国医药工业研究总院上海市医药类
74886206中国船舶及海洋工程设计研究院上海市NaN
74986207上海船舶设备研究所上海市NaN
75086208上海船用柴油机研究所上海市NaN
76186219上海船舶电子设备研究所上海市NaN
78187901上海市计算技术研究所上海市NaN
78287902上海国际问题研究院上海市NaN
78387903上海社会科学院上海市NaN
79589631中共上海市委党校上海市NaN
83090030第二军医大学上海市NaN

In [31]:

data = pd.read_excel(r'..\Stu_pack13\Stu_pack13\pandas\exer_2.xlsx',skiprows = 1)       #不读第一行表头

In [ ]:

 

In [32]:

data.head()

Out[32]:

姓名语文数学英语总分
0Aa837898259
1Bb679356216
2Cc598686231
3Dd756059194
4Ee818179241

三、词云库wordcloud的安装与应用

(1)在线安装:pip install wordcloud
(2)本地安装:pipinstall本地路径/安装包文件名

In [33]:

pip list
Package                            Version
---------------------------------- -------------------
alabaster                          0.7.12
anaconda-client                    1.7.2
anaconda-navigator                 2.0.3
anaconda-project                   0.9.1
anyio                              2.2.0
appdirs                            1.4.4
argh                               0.26.2
argon2-cffi                        20.1.0
asn1crypto                         1.4.0
astroid                            2.5
astropy                            4.2.1
async-generator                    1.10
atomicwrites                       1.4.0
attrs                              20.3.0
autopep8                           1.5.6
Babel                              2.9.0
backcall                           0.2.0
backports.functools-lru-cache      1.6.4
backports.shutil-get-terminal-size 1.0.0
backports.tempfile                 1.0
backports.weakref                  1.0.post1
bcrypt                             3.2.0
beautifulsoup4                     4.9.3
bitarray                           1.9.2
bkcharts                           0.2
black                              19.10b0
bleach                             3.3.0
bokeh                              2.3.2
boto                               2.49.0
Bottleneck                         1.3.2
brotlipy                           0.7.0Note: you may need to restart the kernel to use updated packages.

certifi                            2020.12.5
cffi                               1.14.5
chardet                            4.0.0
click                              7.1.2
cloudpickle                        1.6.0
clyent                             1.2.2
colorama                           0.4.4
comtypes                           1.1.9
conda                              4.10.1
conda-build                        3.21.4
conda-content-trust                0+unknown
conda-package-handling             1.7.3
conda-repo-cli                     1.0.4
conda-token                        0.3.0
conda-verify                       3.4.2
contextlib2                        0.6.0.post1
cryptography                       3.4.7
cycler                             0.10.0
Cython                             0.29.23
cytoolz                            0.11.0
dask                               2021.4.0
decorator                          5.0.6
defusedxml                         0.7.1
diff-match-patch                   20200713
distributed                        2021.4.0
docutils                           0.17
entrypoints                        0.3
et-xmlfile                         1.0.1
fastcache                          1.1.0
filelock                           3.0.12
flake8                             3.9.0
Flask                              1.1.2
fsspec                             0.9.0
future                             0.18.2
gevent                             21.1.2
glob2                              0.7
greenlet                           1.0.0
h5py                               2.10.0
HeapDict                           1.0.1
html5lib                           1.1
idna                               2.10
imagecodecs                        2021.3.31
imageio                            2.9.0
imagesize                          1.2.0
importlib-metadata                 3.10.0
iniconfig                          1.1.1
intervaltree                       3.1.0
ipykernel                          5.3.4
ipython                            7.22.0
ipython-genutils                   0.2.0
ipywidgets                         7.6.3
isort                              5.8.0
itsdangerous                       1.1.0
jdcal                              1.4.1
jedi                               0.17.2
Jinja2                             2.11.3
joblib                             1.0.1
json5                              0.9.5
jsonschema                         3.2.0
jupyter                            1.0.0
jupyter-client                     6.1.12
jupyter-console                    6.4.0
jupyter-core                       4.7.1
jupyter-packaging                  0.7.12
jupyter-server                     1.4.1
jupyterlab                         3.0.14
jupyterlab-pygments                0.1.2
jupyterlab-server                  2.4.0
jupyterlab-widgets                 1.0.0
keyring                            22.3.0
kiwisolver                         1.3.1
lazy-object-proxy                  1.6.0
libarchive-c                       2.9
llvmlite                           0.36.0
locket                             0.2.1
lxml                               4.6.3
MarkupSafe                         1.1.1
matplotlib                         3.3.4
mccabe                             0.6.1
menuinst                           1.4.16
mistune                            0.8.4
mkl-fft                            1.3.0
mkl-random                         1.2.1
mkl-service                        2.3.0
mock                               4.0.3
more-itertools                     8.7.0
mpmath                             1.2.1
msgpack                            1.0.2
multipledispatch                   0.6.0
mypy-extensions                    0.4.3
navigator-updater                  0.2.1
nbclassic                          0.2.6
nbclient                           0.5.3
nbconvert                          6.0.7
nbformat                           5.1.3
nest-asyncio                       1.5.1
networkx                           2.5
nltk                               3.6.1
nose                               1.3.7
notebook                           6.3.0
numba                              0.53.1
numexpr                            2.7.3
numpy                              1.20.1
numpydoc                           1.1.0
olefile                            0.46
openpyxl                           3.0.7
packaging                          20.9
pandas                             1.2.4
pandocfilters                      1.4.3
paramiko                           2.7.2
parso                              0.7.0
partd                              1.2.0
path                               15.1.2
pathlib2                           2.3.5
pathspec                           0.7.0
patsy                              0.5.1
pep8                               1.7.1
pexpect                            4.8.0
pickleshare                        0.7.5
Pillow                             8.2.0
pip                                21.0.1
pkginfo                            1.7.0
pluggy                             0.13.1
ply                                3.11
prometheus-client                  0.10.1
prompt-toolkit                     3.0.17
psutil                             5.8.0
ptyprocess                         0.7.0
py                                 1.10.0
pycodestyle                        2.6.0
pycosat                            0.6.3
pycparser                          2.20
pycurl                             7.43.0.6
pydocstyle                         6.0.0
pyerfa                             1.7.3
pyflakes                           2.2.0
Pygments                           2.8.1
pylint                             2.7.4
pyls-black                         0.4.6
pyls-spyder                        0.3.2
PyNaCl                             1.4.0
pyodbc                             4.0.0-unsupported
pyOpenSSL                          20.0.1
pyparsing                          2.4.7
pyreadline                         2.1
pyrsistent                         0.17.3
PySocks                            1.7.1
pytest                             6.2.3
python-dateutil                    2.8.1
python-jsonrpc-server              0.4.0
python-language-server             0.36.2
pytz                               2021.1
PyWavelets                         1.1.1
pywin32                            227
pywin32-ctypes                     0.2.0
pywinpty                           0.5.7
PyYAML                             5.4.1
pyzmq                              20.0.0
QDarkStyle                         2.8.1
QtAwesome                          1.0.2
qtconsole                          5.0.3
QtPy                               1.9.0
regex                              2021.4.4
requests                           2.25.1
rope                               0.18.0
Rtree                              0.9.7
ruamel-yaml-conda                  0.15.100
scikit-image                       0.18.1
scikit-learn                       0.24.1
scipy                              1.6.2
seaborn                            0.11.1
Send2Trash                         1.5.0
setuptools                         52.0.0.post20210125
simplegeneric                      0.8.1
singledispatch                     0.0.0
sip                                4.19.13
six                                1.15.0
sniffio                            1.2.0
snowballstemmer                    2.1.0
sortedcollections                  2.1.0
sortedcontainers                   2.3.0
soupsieve                          2.2.1
Sphinx                             4.0.1
sphinxcontrib-applehelp            1.0.2
sphinxcontrib-devhelp              1.0.2
sphinxcontrib-htmlhelp             1.0.3
sphinxcontrib-jsmath               1.0.1
sphinxcontrib-qthelp               1.0.3
sphinxcontrib-serializinghtml      1.1.4
sphinxcontrib-websupport           1.2.4
spyder                             4.2.5
spyder-kernels                     1.10.2
SQLAlchemy                         1.4.7
statsmodels                        0.12.2
sympy                              1.8
tables                             3.6.1
tblib                              1.7.0
terminado                          0.9.4
testpath                           0.4.4
textdistance                       4.2.1
threadpoolctl                      2.1.0
three-merge                        0.1.1
tifffile                           2021.4.8
toml                               0.10.2
toolz                              0.11.1
tornado                            6.1
tqdm                               4.59.0
traitlets                          5.0.5
typed-ast                          1.4.2
typing-extensions                  3.7.4.3
ujson                              4.0.2
unicodecsv                         0.14.1
urllib3                            1.26.4
watchdog                           1.0.2
wcwidth                            0.2.5
webencodings                       0.5.1
Werkzeug                           1.0.1
wheel                              0.36.2
widgetsnbextension                 3.5.1
win-inet-pton                      1.1.0
win-unicode-console                0.5
wincertstore                       0.2
wrapt                              1.12.1
xlrd                               2.0.1
XlsxWriter                         1.3.8
xlwings                            0.23.0
xlwt                               1.3.0
xmltodict                          0.12.0
yapf                               0.31.0
zict                               2.0.0
zipp                               3.4.1
zope.event                         4.5.0
zope.interface                     5.3.0

In [34]:

pip show wordcloud
Note: you may need to restart the kernel to use updated packages.
WARNING: Package(s) not found: wordcloud

In [35]:

pip install C:\Users\tx\Desktop\2023110209042雷锦宇\Stu_pack13\Stu_pack13\wordcloud\wordcloud-1.8.1-pp38-pypy38_pp73-win_amd64.whl
Note: you may need to restart the kernel to use updated packages.
ERROR: wordcloud-1.8.1-pp38-pypy38_pp73-win_amd64.whl is not a supported wheel on this platform.

In [36]:

pip install wordcloud
Note: you may need to restart the kernel to use updated packages.
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x00000264D0BBB160>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /simple/wordcloud/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x00000264D0BBB3A0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /simple/wordcloud/
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x00000264D0BBB580>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /simple/wordcloud/
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x00000264D0BBB760>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /simple/wordcloud/
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x00000264D0BBB940>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /simple/wordcloud/
ERROR: Could not find a version that satisfies the requirement wordcloud
ERROR: No matching distribution found for wordcloud

In [37]:

pip show jieda
Note: you may need to restart the kernel to use updated packages.
WARNING: Package(s) not found: jieda

In [38]:

pip show wordcloud
Note: you may need to restart the kernel to use updated packages.
WARNING: Package(s) not found: wordcloud
2、应用
生成步骤:
   创建词云对象-->加载词云文本-->输出词云图片(文件)
   (1)默认的矩形词云图片
   (2)提供的图形词云图片
示例1:生成默认的矩形词云图(原文件为英文)

In [44]:

import wordcloud         #(1)导入词云库
wc = wordcloud.WordCloud()        #(2)创建词云对象
im = wc.generate('2023110209042 lei jin yu')      #用生成器方法加载文本
im.to_image()                #显示输出词云图片
im.to_file('wordcloud.jpg')                   #将生成的词云图片保存在指定的文件夹里面
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-44-8552a71745df> in <module>
----> 1 import wordcloud         #(1)导入词云库
      2 wc = wordcloud.WordCloud()        #(2)创建词云对象
      3 im = wc.generate('2023110209042 lei jin yu')      #用生成器方法加载文本
      4 im.to_image()                #显示输出词云图片
      5 im.to_file('wordcloud.jpg')                   #将生成的词云图片保存在指定的文件夹里面

ModuleNotFoundError: No module named 'wordcloud'

In [45]:

import wordcloud         #(1)导入词云库
wc = wordcloud.WordCloud()        #(2)创建词云对象
with open(r'..\Stu_pack13\Stu_pack13\wordcloud\See You Again.txt') as file:
    fr = file.read()
im = wc.generate('2023110209042 lei jin yu')      #用生成器方法加载文本
im.to_file('wordcloud.jpg')                   #将生成的词云图片保存在指定的文件夹里面
im.to_image()                #显示输出词云图片
  File "<ipython-input-45-982a21768ced>", line 3
    with open(r'..\Stu_pack13\Stu_pack13\wordcloud\See You Again.txt') as file
                                                                              ^
SyntaxError: invalid syntax

示例2:生成指定图形的词云图片(原文件为英文)

In [50]:

import wordcloud
from PIL import Image
import numpy as up            #导入科学计算库
with open(r'..\Stu_pack13\Stu_pack13\wordcloud\See You Again.txt') as file:          #打开文本文件
    fr = file.read()
Image.open(r'..\Stu_pack13\Stu_pack13\wordcloud\Love_Star.PNG')           #打开图片文件
wc = wordcloud.WordCloud(mask = im)                                  #将PIL图像转换为NumPy数组
wc = wordcloud.WordCloud(mask = im,background_color = '#22f99',font_path = r'..\Stu_pack13\Stu_pack13\wordcloud\simhei.ttf')      #颜色可以表达:'英文单词'/(200,22,100)/#十六进制
wc.generate(fr)
wc.to_file('LoveStar.png')
wc.to_image()
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-50-f6771ff42cdc> in <module>
----> 1 import wordcloud
      2 from PIL import Image
      3 import numpy as up            #导入科学计算库
      4 with open(r'..\Stu_pack13\Stu_pack13\wordcloud\See You Again.txt') as file:          #打开文本文件
      5     fr = file.read()

ModuleNotFoundError: No module named 'wordcloud'
示例3:原文件为中文

In [49]:

import wordcloud
from PIL import Image
import numpy as up            #导入科学计算库
with open(r'..\Stu_pack13\Stu_pack13\wordcloud\万疆.txt',enconding = 'utf-8') as file:          #打开文本文件
    fr = file.read()
Image.open(r'..\Stu_pack13\Stu_pack13\wordcloud\Love_Star.PNG')           #打开图片文件
wc = wordcloud.WordCloud(mask = im)                                  #将PIL图像转换为NumPy数组
wc = wordcloud.WordCloud(mask = im,background_color = '#22f99')      #颜色可以表达:'英文单词'/(200,22,100)/#十六进制
wc.generate(fr)
wc.to_file('FiveStar.png')
wc.to_image()
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-49-19874fdac3fd> in <module>
----> 1 import wordcloud
      2 from PIL import Image
      3 import numpy as up            #导入科学计算库
      4 with open(r'..\Stu_pack13\Stu_pack13\wordcloud\万疆.txt',enconding = 'utf-8') as file:          #打开文本文件
      5     fr = file.read()

ModuleNotFoundError: No module named 'wordcloud'

In [ ]:


                
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值