《Modern Python Cookbook》(Python经典实例)笔记1.7 使用正则表达式解析字符串

本文介绍了如何使用Python的re模块进行正则表达式的实战应用,通过分解复杂字符串的实例,详细讲解了从分析文本、编写模式到实现匹配的具体步骤。

Python内置了正则表达式模块re。

分解复杂字符串最简单的方法是将字符串归纳为模式(pattern),然后编写描述该模式的正则表达式。

案例:

假设我们想分离某个食谱网站中的文本。每行内容如下所示:

ingredient = "Kumquat: 2 cups" 

操作步骤如下:

(1)分析文本,概括模式,文本可以分为如下3个部分
(ingredient words): (amount digits) (unit words) 

(2)导入re模块

(3)把模式改写为正则表达式

(4)编译模式

(5)匹配模式

In [1]: import re
In [2]: ingredient ='Kumquat:2 cup'
In [3]: pattern_text=r'(?P<ingredient>\w+):+(?P<amount>\d+)\s+(?P<unit>\w+)'
In [4]: pattern =re.compile(pattern_text)
In [5]: match= pattern.match(ingredient)
In [6]: pattern
Out[6]: re.compile(r'(?P<ingredient>\w+):+(?P<amount>\d+)\s+(?P<unit>\w+)', re.U
NICODE)
In [7]: match
Out[7]: <re.Match object; span=(0, 13), match='Kumquat:2 cup'>
In [8]: match.groups()
Out[8]: ('Kumquat', '2', 'cup')

代码解析:

正则表达式中有大量\,因此需要在字符串前加r前缀,保证不转义。

正则表达式中的(?P<name>...)标识了每个分组,也即给提取出来的内容命名。

\w+提取多个字符,\d+提取多个数字,\s+提取多个空格。

结果保存在groups()返回值中,格式为元组。

正则表达式基础:

正则表达式可以描述多种字符串模式。
前面已经介绍了一些字符类:
 \w匹配任意字母或数字(a到z,A到Z,0到9);
 \d匹配任意十进制数字;
 \s匹配任意空格或制表符。

这些类还有相反的类:
 \W匹配任意不是字母或数字的字符;
 \D匹配任意不是数字的字符;
 \S匹配任意不是某种空格或制表符的字符。

另外还有一些特殊符号:

+作为后缀表示匹配一个或多个前面的模式。\d+表示匹配一个或多个数字。要匹配一个普通的+字符,需要使用\+。
*作为后缀表示匹配零个或多个前面的模式。\w*表示匹配零个或多个字符。要匹配一个*字符,需要使用\*。
?作为后缀表示匹配零个或一个前面的表达式。这个字符还在其他地方使用,并且具有略微不同的含义。在(?P<name>...)中,它在()里面,用于定义分组的特殊属性。
.表示匹配任意单个字符。要匹配具体的.,需要使用\.。[]表示匹配集合中的任一元素。

 

Key Features Develop succinct, expressive programs in Python Learn the best practices and common idioms through carefully explained and structured recipes Discover new ways to apply Python for the new age of development Book Description Python is the preferred choice of developers, engineers, data scientists, and hobbyists everywhere. It is a great scripting language that can power your applications and provide great speed, safety, and scalability. By exposing Python as a series of simple recipes, you can gain insight into specific language features in a particular context. Having a tangible context helps make the language or standard library feature easier to understand. This book comes with over 100 recipes on the latest version of Python. The recipes will benefit everyone ranging from beginner to an expert. The book is broken down into 13 chapters that build from simple language concepts to more complex applications of the language. The recipes will touch upon all the necessary Python concepts related to data structures, OOP, functional programming, as well as statistical programming. You will get acquainted with the nuances of Python syntax and how to effectively use the advantages that it offers. You will end the book equipped with the knowledge of testing, web services, and configuration and application integration tips and tricks. The recipes take a problem-solution approach to resolve issues commonly faced by Python programmers across the globe. You will be armed with the knowledge of creating applications with flexible logging, powerful configuration, and command-line options, automated unit tests, and good documentation. What you will learn See the intricate details of the Python syntax and how to use it to your advantage Improve your code readability through functions in Python Manipulate data effectively using built-in data structures Get acquainted with advanced programming techniques in Python Equip yourself with functional and statistical programming features Write proper tests to be sure a program works as advertised Integrate application software using Python Table of Contents Chapter 1. Numbers, Strings, and Tuples Chapter 2. Statements and Syntax Chapter 3. Function Definitions Chapter 4. Built-in Data Structures – list, set, dict Chapter 5. User Inputs and Outputs Chapter 6. Basics of Classes and Objects Chapter 7. More Advanced Class Design Chapter 8. Functional and Reactive Programming Features Chapter 9. Input/Output, Physical Format, and Logical Layout Chapter 10. Statistical Programming and Linear Regression Chapter 11. Testing Chapter 12. Web Services Chapter 13. Application Integration
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值