Python/Pandas(十四)-字符串与正则

split拆分常和strip使用

val='a,b,guido'
val.split(',')

['a', 'b', 'guido']
pieces=[ x.strip() for x in val.split(',')]
pieces

['a', 'b', 'guido']
'::'.join(pieces)
'a::b::guido'

first,second,third=pieces
first+'::'+'::'+second+'::'+third
'a::::b::guido'

in, find, index 子串定位
find index 的区别是: 如果找不到字符串,index将会引发一个异常, 而不是返回-1

'guido' in val
True
val.index(',')
1


val.find(':')
-1

val.find(',')
1

val.inde(':')

ValueError                                Traceback (most recent call last)
<ipython-input-162-280f8b2856ce> in <module>()
----> 1 val.index(':')

ValueError: substring not found

count 返回子字符串出现的次数
replace用于将制定模式替换为另一个模式,也常常用于删除模式,传入空字符串

val.replace(',','::')
val.replace(',','')

这里写图片描述

正则表达式

re模块的函数可以分为三个大类: 模式匹配,替换以及拆分

import re
text="foo bar \t baz  \tqux"
re.split('\s+',text)

调用re.split正则表达式会先被编译,然后再在text上调用split方法。可以用re.compile自己编译regex以得到一个可以重用的regex对象

regex=re.compile('\s+')
regex.split(text)

如果希望得到匹配到的所有模式,使用findall

regex.findall(text)
[' ', ' \t ', '  \t']

findall 返回字符串中所有的匹配项
search 只返回第一个匹配项
match 只匹配字符串的首部

text="""
Dave dave@google.com
Steve steve@gmail.com
Rob rob@gmail.com
Ryan ryan@yahoo.com
"""
pattern = r'[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}'

regex=re.compile(pattern, flags=re.IGNORECASE)

regex.findall(text)

['dave@google.com', 'steve@gmail.com', 'rob@gmail.com', 'ryan@yahoo.com']

m=regex.search(text)
m
<_sre.SRE_Match object; span=(6, 21), match='dave@google.com'>

text[m.start():m.end()]
'dave@google.com'

print (regex.match(text))
None

sub 来替换指定的字符串

print (regex.sub('REDACTED',text))

Dave REDACTED
Steve REDACTED
Rob REDACTED
Ryan REDACTED

groups 返回模式匹配的一组

pattern = r'([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,4})'
regex=re.compile(pattern,flags=re.IGNORECASE)
m=regex.match('wesm@bright.net')
m.groups()

('wesm', 'bright', 'net')

findall 返回一个元组列表

regex.findall(text)

[('dave', 'google', 'com'),
 ('steve', 'gmail', 'com'),
 ('rob', 'gmail', 'com'),
 ('ryan', 'yahoo', 'com')]
print (regex.sub(r'Usrname: \1, Domain:\2, Suffix:\3',text))

Dave Usrname: dave, Domain:google, Suffix:com
Steve Usrname: steve, Domain:gmail, Suffix:com
Rob Usrname: rob, Domain:gmail, Suffix:com
Ryan Usrname: ryan, Domain:yahoo, Suffix:com

这里写图片描述

报错了,Traceback (most recent call last): File "distance_0214.py", line 80, in <module> df['Distance'] = df['Distance'].astype(float) File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/generic.py", line 6324, in astype new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 451, in astype return self.apply( File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 352, in apply applied = getattr(b, f)(**kwargs) File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 511, in astype new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 242, in astype_array_safe new_values = astype_array(values, dtype, copy=copy) File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 187, in astype_array values = _astype_nansafe(values, dtype, copy=copy) File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 138, in _astype_nansafe return arr.astype(dtype, copy=True) ValueError: could not convert string to float: '[3075.47520444 3037.03637606 3037.71538109 ... 2739.14671304 2777.33042915' Finished writing to file. Traceback (most recent call last): File "distance_0214.py", line 80, in <module> df['Distance'] = df['Distance'].astype(float) File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/generic.py", line 6324, in astype new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 451, in astype return self.apply( File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 352, in apply applied = getattr(b, f)(**kwargs) File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 511, in astype new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 242, in astype_array_safe new_values = astype_array(values, dtype, copy=copy) File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 187, in astype_array values = _astype_nansafe(values, dtype, copy=copy) File "/data1/zhaoshutao/.conda/envs/torch1.9/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 138, in _astype_nansafe return arr.astype(dtype, copy=True) ValueError: could not convert string to float: '[3394.65515305 3358.66594089 3240.6035105 ... 3103.53690088 3141.13975852'
03-12
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

海持Alvin

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值