python分割字符串保留分隔符_在Python中,如何分割一个string并保留分隔符?

本文介绍Python中使用不同方法进行字符串分割的技巧,包括利用正则表达式、自定义函数及列表解析等方式,保持分隔符并处理复杂字符串。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

>>> re.split('(\W)', 'foo/bar spam\neggs') ['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']

如果你正在拆分换行符,使用splitlines(True) 。

>>> 'line 1\nline 2\nline without newline'.splitlines(True) ['line 1\n', 'line 2\n', 'line without newline']

(不是一个通用的解决scheme,但在这里添加这个,以防有人来这里没有实现这个方法。)

另一个没有正则expression式的解决scheme,在Python 3上运行良好

# Split strings and keep separator test_strings = ['', 'Hi', '', ''+unicode_max_char+'' print(split_and_keep(ridiculous_string, '

# This keeps all separators in result ########################################################################## import re st="%%(c+dd+e+f-1523)%%7" sh=re.compile('[\+\-//\*\\%\(\)]') def splitStringFull(sh, st): ls=sh.split(st) lo=[] start=0 for l in ls: if not l : continue k=st.find(l) llen=len(l) if k> start: tmp= st[start:k] lo.append(tmp) lo.append(l) start = k + llen else: lo.append(l) start =llen return lo ############################# li= splitStringFull(sh , st) ['%%(', 'c', '+', 'dd', '+', 'e', '+', 'f', '-', '1523', ')%%', '7']

你也可以用一个string数组而不是一个正则expression式来分割一个string,如下所示:

def tokenizeString(aString, separators): #separators is an array of strings that are being used to split the the string. #sort separators in order of descending length separators.sort(key=len) listToReturn = [] i = 0 while i < len(aString): theSeparator = "" for current in separators: if current == aString[i:i+len(current)]: theSeparator = current if theSeparator != "": listToReturn += [theSeparator] i = i + len(theSeparator) else: if listToReturn == []: listToReturn = [""] if(listToReturn[-1] in separators): listToReturn += [""] listToReturn[-1] += aString[i] i += 1 return listToReturn print(tokenizeString(aString = "\"\"\"hi\"\"\" hello + world += (1*2+3/5) '''hi'''", separators = ["'''", '+=', '+', "/", "*", "\\'", '\\"', "-=", "-", " ", '"""', "(", ")"]))

如果你想分割string,而保持分隔符正则expression式不捕获组:

def finditer_with_separators(regex, s): matches = [] prev_end = 0 for match in regex.finditer(s): match_start = match.start() if (prev_end != 0 or match_start > 0) and match_start != prev_end: matches.append(s[prev_end:match.start()]) matches.append(match.group()) prev_end = match.end() if prev_end < len(s): matches.append(s[prev_end:]) return matches regex = re.compile(r"[\(\)]") matches = finditer_with_separators(regex, s)

如果假定正则expression式被包含进捕获组中:

def split_with_separators(regex, s): matches = list(filter(None, regex.split(s))) return matches regex = re.compile(r"([\(\)])") matches = split_with_separators(regex, s)

这两种方式也将删除在大多数情况下无用和烦人的空组。

如果你只有一个分隔符,你可以使用列表parsing:

text = 'foo,bar,baz,qux' sep = ','

附加/预先分隔符:

result = [x+sep for x in text.split(sep)] #['foo,', 'bar,', 'baz,', 'qux,'] # to get rid of trailing result[-1] = result[-1].strip(sep) #['foo,', 'bar,', 'baz,', 'qux'] result = [sep+x for x in text.split(sep)] #[',foo', ',bar', ',baz', ',qux'] # to get rid of trailing result[0] = result[0].strip(sep) #['foo', ',bar', ',baz', ',qux']

分隔符是它自己的元素:

result = [u for x in text.split(sep) for u in (x, sep)] #['foo', ',', 'bar', ',', 'baz', ',', 'qux', ','] results = result[:-1] # to get rid of trailing

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值