匹配表情emoji 正则,在python正则表达式中匹配unicode表情符号

在Python中,要提取数字和表情符号之间的文本,可以使用正则表达式。当需要匹配所有Unicode表情符号时,可以创建一个包含表情范围的字符类,如[u263a-U0001f645]。例如,`d+(.*?)[u263a-U0001f645]`这个正则表达式可以从文本中提取数字和指定范围内表情符号之间的内容。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

I need to extract the text between a number and an emoticon in a text

example text:

blah xzuyguhbc ibcbb bqw 2 extract1 ☺️ jbjhcb 6 extract2 🙅 bjvcvvv

output:

extract1

extract2

The regex code that I wrote extracts the text between 2 numbers, I need to change the part where it identifies the unicode emoji characters and extracts text between them.

(?<=[\s][\d])(.*?)(?=[\d])

Please suggest a python friendly method, and I need it to work with all the emoji's not only the one's given in the example

解决方案

Since there are a lot of emoji with different unicode values, you have to explicitly specify them in your regex, or if they are with a spesific range you can use a character class. In this case your second simbol is not a standard emoji, it's just a unicode character, but since it's greater than \u263a (the unicode representation of ☺️) you can put it in a range with \u263a:

In [71]: s = 'blah xzuyguhbc ibcbb bqw 2 extract1 ☺️ jbjhcb 6 extract2 🙅 bjvcvvv'

In [72]: regex = re.compile(r'\d+(.*?)(?:\u263a|\U0001f645)')

In [74]: regex.findall(s)

Out[74]: [' extract1 ', ' extract2 ']

Or if you want to match more emojies you can use a character range (here is a good reference which shows you the proper range for different emojies http://apps.timwhitlock.info/emoji/tables/unicode):

In [75]: regex = re.compile(r'\d+(.*?)[\u263a-\U0001f645]')

In [76]: regex.findall(s)

Out[76]: [' extract1 ', ' extract2 ']

Note that in second case you have to make sure that all the characters withn the aforementioned range are emojies that you want.

Here is another example:

In [77]: s = "blah 4 xzuyguhbc 😺 ibcbb bqw 2 extract1 ☺️ jbjhcb 6 extract2 🙅 bjvcvvv"

In [78]: regex = re.compile(r'\d+(.*?)[\u263a-\U0001f645]')

In [79]: regex.findall(s)

Out[79]: [' xzuyguhbc ', ' extract1 ', ' extract2 ']

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值