Pandas数据读写

最新推荐文章于 2025-02-08 07:30:00 发布

sp_ur

最新推荐文章于 2025-02-08 07:30:00 发布

阅读量560

点赞数 1

分类专栏：笔记 python 文章标签： pandas

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.youkuaiyun.com/sp_ur/article/details/86595389

版权

笔记同时被 2 个专栏收录

35 篇文章

订阅专栏

3 篇文章

订阅专栏

1. I/O API工具

读取函数	写入函数
read_csv	to_csv
read_excel	to_excel
read_hdf	to_hdf
read_sql	to_sql
read_json	to_json
read_html	to_html
read_stata	to_stata
read_clipboard	to_clipboard
read_pickle	to_pickle
read_msgpack	to_mspack
read_gbq	to_gbq

2. 读写CSV文件

文件的每一行的元素是用逗号隔开，这种格式的文件就叫CSV文件。

2.1. 从CSV中读取数据

简单读取
excited.csv文件：

white,read,blue,green,animal
1,5,2,3,cat
2,7,8,5,dog
3,3,6,7,horse
2,2,8,3,duck
4,4,2,1,mouse

>>> csvframe = pd.read_csv('E:\\Python\\Codes\\excited.csv')
>>> csvframe
   white  read  blue  green animal
0      1     5     2      3    cat
1      2     7     8      5    dog
2      3     3     6      7  horse
3      2     2     8      3   duck
4      4     4     2      1  mouse

用header和names指定表头
excited.csv文件：

1,5,2,3,cat
2,7,8,5,dog
3,3,6,7,horse
2,2,8,3,duck
4,4,2,1,mouse

>>> csvframe = pd.read_csv('E:\\Python\\Codes\\excited.csv', header=None)
>>> csvframe
   0  1  2  3      4
0  1  5  2  3    cat
1  2  7  8  5    dog
2  3  3  6  7  horse
3  2  2  8  3   duck
4  4  4  2  1  mouse

>>> csvframe = pd.read_csv('E:\\Python\\Codes\\excited.csv', names=['white', 'red', 'blue', 'green', 'animal'])
>>> csvframe
   white  red  blue  green animal
0      1    5     2      3    cat
1      2    7     8      5    dog
2      3    3     6      7  horse
3      2    2     8      3   duck
4      4    4     2      1  mouse

创建等级结构的DataFrame
excited.csv文件：

color,status,item1,item2,item3
black,up,3,4,6
black,down,2,6,7
white,up,5,5,5
white,down,3,3,2
white,left,1,2,1
red,up,2,2,2
red,down,1,1,4

>>> csvframe = pd.read_csv('E:\\Python\\Codes\\excited.csv', index_col=['color', 'status'])
>>> csvframe
              item1  item2  item3
color status                     
black up          3      4      6
      down        2      6      7
white up          5      5      5
      down        3      3      2
      left        1      2      1
red   up          2      2      2
      down        1      1      4

2.2. 写入数据到CSV中

简单写入
code.py:

>>> frame = pd.DataFrame(np.arange(16).reshape((4,4)), columns = ['red', 'blue', 'orange', 'black'], index = ['a', 'b', 'c', 'd'])
>>> frame
   red  blue  orange  black
a    0     1       2      3
b    4     5       6      7
c    8     9      10     11
d   12    13      14     15
>>> frame.to_csv('E:\\Python\\Codes\\excited.csv')

excited.csv:

,red,blue,orange,black
a,0,1,2,3
b,4,5,6,7
c,8,9,10,11
d,12,13,14,15

可以发现第一行的前面有一个’,’，因为列名前面有一个空白。

取消索引和列的写入
code.py:

>>> frame.to_csv('E:\\Python\\Codes\\excited.csv', index = False, header = False)

excited.csv:

0,1,2,3
4,5,6,7
8,9,10,11
12,13,14,15

处理NaN元素
code.py:

>>> frame = pd.DataFrame([[3, 2, np.NaN], [np.NaN, np.NaN, np.NaN], [2, 3, 3]], index = ['a', 'b', 'c'], columns = ['red', 'black', 'orange'])
>>> frame
   red  black  orange
a  3.0    2.0     NaN
b  NaN    NaN     NaN
c  2.0    3.0     3.0
>>> frame.to_csv('E:\\Python\\Codes\\excited.csv')

使用np_rep参数把空字段替换
>>> frame.to_csv('E:\\Python\\Codes\\excited.csv', na_rep = 'lalala')

excited.csv:

,red,black,orange
a,3.0,2.0,
b,,,
c,2.0,3.0,3.0
可以发现所有的NaN就是为空的

替换
,red,black,orange
a,3.0,2.0,lalala
b,lalala,lalala,lalala
c,2.0,3.0,3.0
这里发现列首的第一个还是没有东西，因为它本身不存在？

3. 读写TXT文件

TXT文件不一定是以逗号或者分号分割数据的，这种时候要用正则表达式。通常还要配合’‘号表示匹配任意多个。例如’\s’.

符号	意义
.	换行符以外的单个字符
\d	数字
\D	非数字字符
\s	空白字符
\S	非空白字符
\n	换行符
\t	制表符
\uxxxx	用十六进制数字xxxx表示的Unicode字符

简单读取
excited.txt:

乱加空格和制表符
white red blue green
 1   5 2 3
2 7  8   5
2 3 3 3

>>> pd.read_table('E:\\Python\\Codes\\excited.txt', sep = '\s*')
__main__:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
E:\Python\Python3\lib\site-packages\pandas\io\parsers.py:2137: FutureWarning: split() requires a non-empty pattern match.
  yield pat.split(line.strip())
E:\Python\Python3\lib\site-packages\pandas\io\parsers.py:2139: FutureWarning: split() requires a non-empty pattern match.
  yield pat.split(line.strip())
   white  red  blue  green
0      1    5     2      3
1      2    7     8      5
2      2    3     3      3
第一次尝试的时候报错了,于是按照提示加上

>>> pd.read_table('E:\\Python\\Codes\\excited.txt', sep = '\s*', engine = 'python')
   white  red  blue  green
0      1    5     2      3
1      2    7     8      5
2      2    3     3      3
成功了，其中'*'号的意思是匹配任意多个

读取时排除一些行
excited.txt:

12#$@!%$!$#!@$!@$!@
#$%^$^%$#!
@#%!
white red blue green
!$#$!@$#!@$
 1   5 2 3
2 7  8   5
2 3 3 3
^&##$^@FGSDQAS

>>> pd.read_table('E:\\Python\\Codes\\excited.txt', sep = '\s*', engine = 'python', skiprows = [0, 1, 2, 4, 8])
   white  red  blue  green
0      1    5     2      3
1      2    7     8      5
2      2    3     3      3
列表内代表要跳过的行

读取部分数据
sep也可以用在read_csv啊原来。nrows代表读取几行的数据，例如nrows=3那么就读取3行的数据。
chunksize是把文件分割成一块一块的，chunksize=3的话就是每一块的行数为3.

excited.txt:

white red blue green black orange golden
 1   5 2 3 111 222 233
100 7    8   5 2333 23333 233333
20 3 3 3 12222 1222 23232
2000 7   8   5 2333 23333 233333
300 3 3 3 12222 1222 23232

>>> frame = pd.read_csv('E:\\Python\\Codes\\excited.txt', sep = '\s*', skiprows=[2], nrows = 3, engine = 'python')
>>> frame
   white  red  blue  green  black  orange  golden
0      1    5     2      3    111     222     233
1     20    3     3      3  12222    1222   23232
2   2000    7     8      5   2333   23333  233333
从头开始读三行，并且跳过了第三行

>>> pieces = pd.read_csv('E:\\Python\\Codes\\excited.txt', sep = '\s*', chunksize = 2, engine = 'python')
>>> for piece in pieces:
...   print (piece)
...   print (type(piece))
... 
   white  red  blue  green  black  orange  golden
0      1    5     2      3    111     222     233
1    100    7     8      5   2333   23333  233333
<class 'pandas.core.frame.DataFrame'>
   white  red  blue  green  black  orange  golden
2     20    3     3      3  12222    1222   23232
3   2000    7     8      5   2333   23333  233333
<class 'pandas.core.frame.DataFrame'>
   white  red  blue  green  black  orange  golden
4    300    3     3      3  12222    1222   23232
<class 'pandas.core.frame.DataFrame'>
每两个为一块。并且类型都是DataFrame。

3.2. 写入数据到TXT中

写入数据的话和csv是一样的。

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。