Series.str的方法用途

最新推荐文章于 2023-09-04 19:49:00 发布

原创最新推荐文章于 2023-09-04 19:49:00 发布 · 5.5k 阅读

63 ·

CC 4.0 BY-SA版权

python知识点查询专栏收录该内容

2 篇文章

订阅专栏

本文介绍了Pandas中Series.str的多种方法，包括拼接、切分、获取、连接、判断、替换、重复、补齐、切割、匹配、计算、转换等操作。详细讲解了如cat(), split(), get(), join(), contains(), replace(), pad(), slice(), count(), startswith(), findall(), extract()等方法的使用，帮助理解如何对Series中的字符串进行高效处理。" 128132558,15180985,Java毕业设计：代驾管理系统Mybatis实现,"['Java', 'Mybatis', '数据库管理', '毕业设计', 'B/S架构']

Series.str的类型为pandas.core.strings.StringMethods，可以简单的理解为把Series类型转化成str，可以对Series进行有关str的一些操作；下面介绍Series.str的方法

文章目录

1、cat（）：拼接字符串

from pandas import Series
a=Series(['a','b','c'])
print(a)
#输出结果：
0    a
1    b
2    c
dtype: object

a1=a.str.cat(['A','B','C'],sep='.')
a2=a.str.cat(sep=',')
a3=a.str.cat([['x','y','z'],['1','2','3']],sep=',')
print(a1)
print(a2)
print(a3)
#输出结果为：
0    a.A
1    b.B
2    c.C
dtype: object
a,b,c
0    a,x,1
1    b,y,2
2    c,z,3
dtype: object

2、split（）：切分字符串，返回列表

from pandas import Series
import numpy as np
s=Series(['a_b_c','c_d_e',np.nan,'f_g_h'])
print(s)
#输出结果为：
0    a_b_c
1    c_d_e
2      NaN
3    f_g_h
dtype: object

s1=s.str.split('_')  #默认n=-1
s2=s.str.split('_',0) #n=-1或者0都会全分割
s3=s.str.split('_',1)
print(s1)
print(s2)
print(s3)
#输出结果为：
0    [a, b, c]
1    [c, d, e]
2          NaN
3    [f, g, h]
dtype: object
0    [a, b, c]
1    [c, d, e]
2          NaN
3    [f, g, h]
dtype: object
0    [a, b_c]
1    [c, d_e]
2         NaN
3    [f, g_h]
dtype: object

3、get（）：获取指定位置的字符串

print(s.str.get(0))
#输出结果：
0      a
1      c
2    NaN
3      f
dtype: object

获取指定位置字符串还有另一种方式

print(s.str[0:2])
#输出结果为：
0     a_
1     c_
2    NaN
3     f_
dtype: object

4、join（）：对每个字符都用给定的字符连接起来

print(s.str.join('!'))
#输出结果：
0    a!_!b!_!c
1    c!_!d!_!e
2          NaN
3    f!_!g!_!h
dtype: object

5、contains（）：判断是否包含表达式

print(s.str.contains('d'))
#输出结果：
0    False
1     True
2      NaN
3    False
dtype: object

6、replace（）：替换

print(s.str.replace('_','.'))
#输出结果为：
0    a.b.c
1    c.d.e
2      NaN
3    f.g.h
dtype: object

7、repeat（）：重复

print(s.str.repeat(3))
#输出结果为：
0    a_b_ca_b_ca_b_c
1    c_d_ec_d_ec_d_e
2                NaN
3    f_g_hf_g_hf_g_h
dtype: object

8、补齐

（1）pad（self, width, side=‘left’, fillchar=’ '）左右补齐

print(s.str.pad(10,fillchar='?',side='right'))
#输出结果：
0    a_b_c?????
1    c_d_e?????
2           NaN
3    f_g_h?????
dtype: object

print(s.str.pad(10,fillchar='?',side='left'))
#输出结果：
0    ?????a_b_c
1    ?????c_d_e
2           NaN
3    ?????f_g_h
dtype: object

print(s.str.pad(10,fillchar='?',side='both'))
#输出结果：

0    ??a_b_c???
1    ??c_d_e???
2           NaN
3    ??f_g_h???
dtype: object

（2）center（）中间补齐

print(s.str.center(10,fillchar='?'))
#输出结果：
0    ??a_b_c???
1    ??c_d_e???
2           NaN
3    ??f_g_h???
dtype: object

（3）ljust（）右边补齐、rjust（）左边补齐

print(s.str.ljust(10,fillchar='?'))
#输出结果为：
0    a_b_c?????
1    c_d_e?????
2           NaN
3    f_g_h?????
dtype: object

（4）zfill（）左边补0

print(s.str.zfill(10))
#输出结果为：
0    00000a_b_c
1    00000c_d_e
2           NaN
3    00000f_g_h
dtype: object

9、wrap()：在指定的位置加回车符号

print(s.str.wrap(3))
#输出结果为：
0    a_b\n_c
1    c_d\n_e
2        NaN
3    f_g\n_h
dtype: object

10、slice()：按给点的开始结束位置切割字符串

print(s.str.slice(1,3))
#输出结果为：
0     _b
1     _d
2    NaN
3     _g
dtype: object

11、slice_replace：使用给定的字符串，替换指定位置的字符

print(s.str.slice_replace(1,3,"?"))
#输出结果：
0    a?_c
1    c?_e
2     NaN
3    f?_h
dtype: object

12、count（）：计算给定单词出现的次数

print(s.str.count('a'))
#输出结果：
0    1.0
1    0.0
2    NaN
3    0.0
dtype: float64

13、startswith()：判断是否以给定的字符串开头,endswith() ：判断是否以给定的字符串结束

print(s.str.startswith('a'))
#输出结果为：
0     True
1    False
2      NaN
3    False
dtype: object

14、findall() ：查找所有符合正则表达式的字符，以数组形式返回

print(s.str.findall('[a-z]'))
#输出结果：
0    [a, b, c]
1    [c, d, e]
2          NaN
3    [f, g, h]
dtype: object

15、match() ：检测是否全部匹配给点的字符串或者表达式

print(s.str.match('[d-z]'))
#输出结果：
0    False
1    False
2      NaN
3     True
dtype: object

16、extract() ：抽取匹配的字符串出来，注意要加上括号，把你需要抽取的东西标注上

print(s.str.extract("([d-z])"))

#输出结果:
     0
0  NaN
1    d
2  NaN
3    f

17、len() ：计算字符串的长度

print(s.str.len())
#输出结果：
0    5.0
1    5.0
2    NaN
3    5.0
dtype: float64

18、strip() ：去除前后的空白字符、rstrip()：去除后面的空白字符、lstrip() ：去除前面的空白字符

name=Series(['Jack  ','jill ','jesse ','frank  '])
name.str.strip()
#输出结果：
0     Jack
1     jill
2    jesse
3    frank
dtype: object

19、partition() ：把字符串数组切割称为DataFrame，注意切割只是切割称为三部分，分隔符前，分隔符，分隔符后

print(s.str.partition('_'))
#输出结果：
	0	1	2
0	a	_	b_c
1	c	_	d_e
2	NaN	NaN	NaN
3	f	_	g_h

#rpartition() 从右切起
print(s.str.rpartition('_'))
#输出结果为：
    0    1    2
0  a_b    _    c
1  c_d    _    e
2  NaN  NaN  NaN
3  f_g    _    h

20、lower() 全部小写、upper() 全部大写

print(s.str.upper())
#输出结果：
0    A_B_C
1    C_D_E
2      NaN
3    F_G_H
dtype: object

21、find() ：从左边开始，查找给定字符串的所在位置,-1表示找不到

print(s.str.find('d'))
#输出结果：

0   -1.0
1    2.0
2    NaN
3   -1.0
dtype: float64


#rfind() 从右边开始，查找给定字符串的所在位置
print(s.str.rfind('e'))
#输出结果：
0   -1.0
1    4.0
2    NaN
3   -1.0
dtype: float64

22、index() ：查找给定字符串的位置，注意，如果不存在这个字符串，那么会报错！

print(s.str.index('_'))
#输出结果:
0    1.0
1    1.0
2    NaN
3    1.0
dtype: float64

#rindex() 查找给定字符串的位置,最大索引
print(s.str.rindex('_'))
#输出结果：
0    3.0
1    3.0
2    NaN
3    3.0
dtype: float64

23、capitalize() ：首字符大写

print(s.str.capitalize())
#输出结果：
0    A_b_c
1    C_d_e
2      NaN
3    F_g_h
dtype: object

24、swapcase() ：大小写互换

print(s.str.swapcase())
#输出结果：
0    A_B_C
1    C_D_E
2      NaN
3    F_G_H
dtype: object

25、isalnum() ：是否全部是数字和字母组成

print(s.str.isalnum())
#输出结果:
0    False
1    False
2      NaN
3    False
dtype: object

26、其余的方法

isalpha() ：是否全部是字母
isdigit() ：是否全部都是数字
isspace()：是否是空格
islower() ：是否全部小写
isupper()：是否全部大写
istitle() ：是否只有首字母为大写，其他字母为小写
isnumeric() ：是否是数字
isdecimal() ：是否是数字（带小数点的）