如果浏览Pattern类规范会看到总结支持正则表达式结构的表。表13-1描述字符类。
左边一列指定正则表达式结构,右边一列描述每个结构在什么情况下匹配。
表13-1 字符类
[abc] |
a、b或者c(简单类) |
[^abc] |
除a、b或者c之外的任何字符(非) |
[a-zA-Z] |
a到z,或者A到Z(包含)(范围) |
[a-d[m-p]] |
a到d,或者m到p:[a-dm-p](并) |
[a-z&&[def]] |
d、e或者f(交) |
[a-z&&[^bc]] |
a到z,除b和c之外:[ad-z](减) |
[a-z&&[^m-p]] |
a到z,不包括m到p:[a-lq-z](减) |
注意 短语“字符类”中“类”这个词不表示.class文件。在正则表达式的上下文表述中,字符类是括在方括号内的字符集合。它表示这些字符将和给定输入字符串内的单一字符成功匹配。
简单类
字符类最基本的形式是方括号中简单并排放置的字符集合。例如,正则表达式[bcr]at将和单词“bat”、“cat”或者“rat”匹配,因为它定义一个字符类(接受“b”、“c”或者“r”)作为其第一个字符:
Enter your regex: [bcr]at
Enter input string to search: bat
I found the text "bat" starting at index 0 and ending at index 3.
Enter your regex: [bcr]at
Enter input string to search: cat
I found the text "cat" starting at index 0 and ending at index 3.
Enter your regex: [bcr]at
Enter input string to search: rat
I found the text "rat" starting at index 0 and ending at index 3.
Enter your regex: [bcr]at
Enter input string to search: hat
No match found.
在上面的例子中,只有当第一个字符和字符类定义的字符之一匹配时,整体匹配才成功。
1. 非
为了匹配列出的字符之外的所有字符,需要在字符类的开头插入“^”。这种技术被称为非(negation):
Enter your regex: [^bcr]at
Enter input string to search: bat
No match found.
Enter your regex: [^bcr]at
Enter input string to search: cat
No match found.
Enter your regex: [^bcr]at
Enter input string to search: rat
No match found.
Enter your regex: [^bcr]at
Enter input string to search: hat
I found the text "hat" starting at index 0 and ending at index 3.
只有当输入字符串的第一个字符不包含字符类中定义的任何字符时,匹配才成功。
2. 范围
有时候,你会希望定义一个字符类包含一个范围内的值,比如字母“a”到“h”或者数字“1”到“5”。为了指定范围,只需在要匹配的第一个和最后一个字符之间插入“-”即可,比如[1-5]或者[a-h]。也可以在类中连着放置不同范围,以便进一步扩展匹配的可能性。例如[a-zA-Z]将匹配字母表中的任何字母:a到z(小写)或者A到Z(大写)。
下面是范围和非的一些例子:
Enter your regex: [a-c]
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
Enter your regex: [a-c]
Enter input string to search: b
I found the text "b" starting at index 0 and ending at index 1.
Enter your regex: [a-c]
Enter input string to search: c
I found the text "c" starting at index 0 and ending at index 1.
Enter your regex: [a-c]
Enter input string to search: d
No match found.
Enter your regex: foo[1-5]
Enter input string to search: foo1
I found the text "foo1" starting at index 0 and ending at index 4.
Enter your regex: foo[1-5]
Enter input string to search: foo5
I found the text "foo5" starting at index 0 and ending at index 4.
Enter your regex: foo[1-5]
Enter input string to search: foo6
No match found.
Enter your regex: foo[^1-5]
Enter input string to search: foo1
No match found.
Enter your regex: foo[^1-5]
Enter input string to search: foo6
I found the text "foo6" starting at index 0 and ending at index 4.
3. 并
也可以使用并(union)创建由两个或者多个独立字符类构成的单一字符类。为了创建并,只需在一个类中嵌套另一个类,比如[0-4[6-8]]。这个并创建的单一字符类匹配数字0、1、2、3、4、6、7和8。
Enter your regex: [0-4[6-8]]
Enter input string to search: 0
I found the text "0" starting at index 0 and ending at index 1.
Enter your regex: [0-4[6-8]]
Enter input string to search: 5
No match found.
Enter your regex: [0-4[6-8]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.
Enter your regex: [0-4[6-8]]
Enter input string to search: 8
I found the text "8" starting at index 0 and ending at index 1.
Enter your regex: [0-4[6-8]]
Enter input string to search: 9
No match found.
4. 交
为了创建只和其所有嵌套类共有的字符匹配的单一字符类,需要使用&&,比如[0-9&&[345]]。这个交创建只和两个字符类共有的数字(3、4和5)匹配的单一字符类:
Enter your regex: [0-9&&[345]]
Enter input string to search: 3
I found the text "3" starting at index 0 and ending at index 1.
Enter your regex: [0-9&&[345]]
Enter input string to search: 4
I found the text "4" starting at index 0 and ending at index 1.
Enter your regex: [0-9&&[345]]
Enter input string to search: 5
I found the text "5" starting at index 0 and ending at index 1.
Enter your regex: [0-9&&[345]]
Enter input string to search: 2
No match found.
Enter your regex: [0-9&&[345]]
Enter input string to search: 6
No match found.
下面的例子显示两个范围的交:
Enter your regex: [2-8&&[4-6]]
Enter input string to search: 3
No match found.
Enter your regex: [2-8&&[4-6]]
Enter input string to search: 4
I found the text "4" starting at index 0 and ending at index 1.
Enter your regex: [2-8&&[4-6]]
Enter input string to search: 5
I found the text "5" starting at index 0 and ending at index 1.
Enter your regex: [2-8&&[4-6]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.
Enter your regex: [2-8&&[4-6]]
Enter input string to search: 7
No match found.
5. 减
最后,可以使用减(subtraction)去掉一个或者多个嵌套字符类,比如[0-9&&[^345]]。这个例子创建从0到9的所有值,但除3、4和5之外的单一字符类:
Enter your regex: [0-9&&[^345]]
Enter input string to search: 2
I found the text "2" starting at index 0 and ending at index 1.
Enter your regex: [0-9&&[^345]]
Enter input string to search: 3
No match found.
Enter your regex: [0-9&&[^345]]
Enter input string to search: 4
No match found.
Enter your regex: [0-9&&[^345]]
Enter input string to search: 5
No match found.
Enter your regex: [0-9&&[^345]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.
Enter your regex: [0-9&&[^345]]
Enter input string to search: 9
I found the text "9" starting at index 0 and ending at index 1.
现在我们介绍了如何创建字符类,在阅读下一小节之前你可能希望回顾一下表13-1。