关于Python中split函数那些事

最新推荐文章于 2025-06-05 11:40:37 发布

shadow_dreamer

最新推荐文章于 2025-06-05 11:40:37 发布

阅读量5.1k

点赞数 3

CC 4.0 BY-SA版权

分类专栏： Python 文章标签： python split 函数

本文链接：https://blog.youkuaiyun.com/shadow_dreamer/article/details/51582418

Python 专栏收录该内容

5 篇文章

订阅专栏

本文对比了Python中标准split函数与re.split函数的功能与使用场景，详细解析了如何利用这两种函数来有效处理文本数据，特别是在面对复杂数据格式时的解决方案。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

关于Python中split函数那些事

前言

在学习决策树算法中需要将文本的数据提取出来，看到如下格式的文本想到的就是split函数：

sunny      hot             high           false       N

sunny      hot             high           true         N

overcast  hot             high           false       P

rain           mild           high           false       P

由于数据记录比较长，把每列之间的空格统一需要耗一定的时间，想到split函数利用空格分隔字符串，取第一行做测试，代码如下：

这肯定不是我想要的结果，首先想到的是不是这个函数用错了，就help(str.split)：

String.Split函数解释

split(...)

   S.split([sep [,maxsplit]]) -> list of strings

   

   Return a list of the words in the string S, using sep as the

   delimiter string.  If maxsplit isgiven, at most maxsplit

   splits are done. If sep is not specified or is None, any

   whitespace string is a separator and empty strings are removed

from the result.

大致意思是利用Sep 分隔字符串，返回分隔后的列表；如果指定maxsplit则最多返回的列表长度不大于maxsplit,即分割多少次的意思例如：

>>> u = "www.doiido.com.cn"

#分割一次

>>>print u.split('.',1)

['www','doiido.com.cn']

#分割两次

>>>print u.split('.',2)

['www','doiido', 'com.cn']

最后发现这个函数根本不能一次性分隔出我想要的列表，只能问问百度大哥，终于发现python提供split函数，不是string特有的，还有个比较重要的模块也有，就是正则表达式re模块。

re.split函数

用这个函数之前，需加载re模块：

Import re

help(re.split)

Help on function split in module re:

 

split(pattern, string, maxsplit=0, flags=0)

   Split the source string by the occurrences of the pattern,

returning a listcontaining the resulting substrings.

这函数有一个主要的参数pattern，译为模式，也就是会按照你给定的模式分割字符串（模式有什么规则，请搜索正则表达式，这里不累述）还用上面的例子:

<strong>re.split('\s+',str)

['sunny', 'hot', 'high', 'false', 'N']</strong>

OK，得到正确的结果。

在网上还发现，还有个split函数用的比较多，就是路径分割

os.path.split()函数
语法：os.path.split('PATH')

参数说明：

PATH指一个文件的全路径作为参数：
如果给出的是一个目录和文件名，则输出路径和文件名
如果给出的是一个目录名，则输出路径和为空文件名