正则习点 --- 08

最新推荐文章于 2024-06-24 19:06:50 发布

最新推荐文章于 2024-06-24 19:06:50 发布 · 160 阅读

2.2.5. Adding Commas to Number with Lookaround

在这一节，我们要引入一个新的概念：环视(lookaround)。

他的定义：不匹配任何字符，只匹配文本中的特定位置（positions）。

环视分两种：顺序环视(lookahead)和逆序环视(lookbehind)。

我们在后面的小段会给出详细的介绍。

现在我们只需要知道顺序环视是从左到右查看文本；而逆序环视是从右到左查看文本。

为了理解这个新单词，我们可以想象她像单词分界符「\b」、锚点「^」和「$」一样的工作！但是，比它们更加通用！

2.2.5.1 A few more lookahead examples

一个不用环视的例子：

s/\bJeffs\b/Jeff’s/g

全字匹配“Jeffs”，然后，把它替换“Jeff’s”。

使用顺序环视解决这个问题：

s/\bJeff(?=s\b)/Jeff’/g

如图：

首先，匹配「Jeff」，然后，尝试顺序环视。

只有当「s\b」在此位置能够匹配时（也就是’Jeff’之后紧跟一个’s’和一个单词分界符）整个表达式才能匹配成功。

也就是「Jeff」确定匹配文本，而顺序环视只是“选择”一个位置。

换成逆序环视的例子：

s/(?<=\bJeff)(?=s\b)/’/g

如图，

我们来总结解决这个问题的几种方法：

Solution	Comments
s/\bJeffs\b/Jeff’s/g	解决此类问题最容易想到的办法，未使用环视，正则“占用”整个’Jeffs’
s/\b(Jeff)(s)\b/$1’$2/g	只增加了变量，没有多余的好处
s/\bJeff(?=s\b)/Jeff’/g	并没有占用’s’，除了展示顺序环视之外，没有什么实用价值。
s/(?<=\bJeff)(?=s\b)/’/g	并没有“占用”任何文本，同时使用顺序环视和逆序环视匹配需要的位置，即撇号插入的位置。非常适用于讲解环视。
s/(?=s\b)(?<=\bJeff)/’/g	与上一个表达式完全相同，只是颠倒了两个环视结构。因为它并没有占用任何字符，所以变换顺序并没有影响。

好了，我们在实际中来使用环视。

2.2.5.2 The comma example

大的数值，如果在其间加入逗号，会更容易看懂。

例如，“The US population is 298444215”, 如果使用逗号，“298,444,215”会看起来更加自然。

插入逗号，必须满足“左边有数字，右边数字的个数正好是3的倍数”。

左边有数字，使用逆序环视：「(?<=\d)」

右边数字的个数正好是3的倍数，使用顺序环视：「(?=(\d\d\d)+$)」

整个正则看起来像：

s/(?<=\d)(?=(\d\d\d)+$)/’/g

我们写一段代码，来验证它：

#! /usr/bin/perl -w

# Mastering Regular Expressiona: Chapter 2 Section 2.
# fourth program

$testVal = 12345 * 1987;

$testVal =~ s/(?<=\d)(?=(\d\d\d)+$)/,/g;
print "This Value is $testVal decimal.";

执行结果：

This Value is 24,529,515 decimal.

2.2.5.3 Word boundaries and negative lookaround

我们再增加点难度，如果数字是在整个字符串的中间。怎么办呢？

$testStr = “This Value is 24529515 decimal.”

这个时候，就不能使用行尾锚点(「$」)，而应该换上单词分界符(「\b」)

修改后的程序如下：

#! /usr/bin/perl -w

# Mastering Regular Expressiona: Chapter 2 Section 2.
# fourth program

$testVal = 12345 * 1987;

$testStr = "This Value is $testVal decimal.";

$testStr =~ s/(?<=\d)(?=(\d\d\d)+\b)/,/g;

print $testStr;

迄今为止我们用到的顺序环视和逆序环视应该被称作肯定顺序环视(positive lookahead)和肯定逆序环视(positive lookbehind)。因为它们成功的条件都是子表达式在这些位置能够匹配。

那么有没有成功的条件是子表达式无法匹配的环视呢？有！

它们是：否定顺序环视(negative lookahead)和否定逆序环视(negative lookbehind)。见下表：

Four Types of Lookaround

Type	Regex	Successful if the enclosed subexpression
Positive Lookbehind	(?<=……)	Successful if can match to the left
Negative Lookbehind	(?<!……)	Successful if can not match to the left
Positive Lookahead	(?=……)	Successful if can match to the right
Negative Lookahead	(?!……)	Successful if can not match to the right

我们再增加点难度，如果数字旁边有字母，怎么办呢？

$testStr = “This Value is 24529515Hz decimal.”

这个时候使用否定顺序环视会更好！

(?!\d)

更改后的正则如下：

s/(?=\d)(?=(\d\d\d)+(?!\d))/,/g

修改后的程序如下：

#! /usr/bin/perl -w

# Mastering Regular Expressiona: Chapter 2 Section 2.
# fourth program

$testVal = "24529515Hz";	# 12345 * 1987

$testStr = "This Value is $testVal decimal.";

$testStr =~ s/(?<=\d)(?=(\d\d\d)+(?!\d))/,/g;

print $testStr;