正则表达式

最新推荐文章于 2021-08-11 21:43:18 发布

ISJINHAO

最新推荐文章于 2021-08-11 21:43:18 发布

阅读量400

点赞数

分类专栏： Java SE基础文章标签：正则表达式

本文链接：https://blog.youkuaiyun.com/qq_38206090/article/details/83280497

版权

Java SE基础专栏收录该内容

9 篇文章

订阅专栏

概述

正则表达式(regular expression)描述了一种字符串匹配的模式（pattern），可以用来检查一个串是否含有某种子串、将匹配的子串替换或者从某个串中取出符合某个条件的子串等。但是上面的叙述，对于之前没有接触过正则表达式的人还是很迷，我们打个比方，有一串字符：123xyz234和一个模式：*^*，我们假设*表示任意长度的由数字组成的字符串，^表示任意长度的由英文字符表示的字符串，那么我们就可以说这个字符串能匹配上这个模式。因为123可以匹配上*，xyz可以匹配上^，234可以匹配上*。同样的，假如我们再有一个模式：*^，我们用这个模式在字符串中提取，可以提取出来：123、123x、123xyz、234 等等，但是不能提取出来z234、123xyz234。因为我们能提取出来的都是符合这个模式的，这个模式就是正则表达式。

语法

不同的语言在正则表达式上的语法是有差距的，但是相同点远远大于不同点。我们在这使用Java语言中正则表达式。不过正则表达式的语法非常难记，在这也是举例出一些常用的语法，具体使用还是得查API文档。

最简单的正则表达式

在我们刚才的举例中可以看出正则表达式其实就是一种匹配规则，那么每个字符串也都是一种匹配规则。比如下面的split()方法是按照正则表达式把字符串分割，我们传入的一个字符串就是一个正则表达式。

public class RegTest {
	public static void main(String[] args) {
		
		String testStr = "1234567890";
		/**
		 * 	Splits this string around matches of the given regular expression. 
		 */
		String[] split = testStr.split("67");
		System.out.println(Arrays.deepToString(split));
		
	}
}

和正则表达式有关的类

虽然上个例子我们并没有使用到正则表达式相关的类，但这并不表明正则表达式就是一个字符串这么简单，在split()方法的内部还是调用了正则表达式相关的方法。

Pattern类：Pattern 对象是一个正则表达式的编译表示。Pattern 类没有公共构造方法。要创建一个 Pattern 对象，你必须首先调用其公共静态编译方法，它返回一个 Pattern 对象。该方法接受一个正则表达式作为它的第一个参数。
Matcher类：Matcher 对象是对输入字符串进行解释和匹配操作的引擎。与Pattern 类一样，Matcher 也没有公共构造方法。你需要调用 Pattern 对象的 matcher 方法来获得一个 Matcher 对象。
PatternSyntaxException：PatternSyntaxException 是一个非强制异常类，它表示一个正则表达式模式中的语法错误。

现在我们使用正则表达式有关的类来完成上面的例子：

public class RegTest {
	public static void main(String[] args) {
		
		String testStr = "1234567890";
		Pattern pattern = Pattern.compile("67");
		String[] split = pattern.split(testStr);
		System.out.println(Arrays.deepToString(split));
		
	}
}

PatternSyntaxException 式一个异常类，但是不强制处理正则表达式的异常，所以这里可加可不加。Matcher的用法请继续看下去。

匹配

在上面我们测试的是正则表达式的分割作用，但是这并不是一个很好的学习正则表达式的例子。所以下面我们将采用匹配方法来学习正则表达式，先给一个小例子。

public class RegTest {
	public static void main(String[] args) {
		
		String testStr = "1234567890";
		boolean matches = Pattern.matches("67", testStr);
		System.out.println(matches);
	}
}

这里的输出结果肯定是false了，因为12345和890都无法在模式中被匹配。

字符类

[abc] ：a、b 或 c（简单类）
[^abc] ：任何字符，除了 a、b 或 c（否定）
[a-zA-Z] ：a 到 z 或 A 到 Z，两头的字母包括在内（范围）
[a-d[m-p]] ：a 到 d 或 m 到 p：[a-dm-p]（并集）
[a-z&&[def]] ：d、e 或 f（交集）
[a-z&&[^bc]] ：a 到 z，除了 b 和 c：[ad-z]（减去）
[a-z&&[^m-p]] ：a 到 z，而非 m 到 p：[a-lq-z]（减去）

public class RegTest {
	public static void main(String[] args) {
		
		String testStr = "123";
		Pattern.matches("[123][123][123]", testStr); 	//true
		Pattern.matches("[^123][123][123]", testStr); 	//false
		
	}
}

预定义字符类

. ：任何字符（与行结束符可能匹配也可能不匹配）
\d ：数字：[0-9]
\D ：非数字： [^0-9]
\s ：空白字符：[ \t\n\x0B\f\r]
\S ：非空白字符：[^\s]
\w ：单词字符：[a-zA-Z_0-9]
\W ：非单词字符：[^\w]

public class RegTest {
	public static void main(String[] args) {
		
		String testStr = "123";
		System.out.println(Pattern.matches("\\d\\d\\d", testStr)); 	//true
		System.out.println(Pattern.matches("\\w\\w\\w", testStr));	//true
	}
}

数量词

X?： X存在一次或一次也没有
X* ：X存在零次或多次
X+ ：X存在一次或多次
X{n} ：X存在恰好 n 次
X{n,} ：X存在至少 n 次
X{n,m} ：X存在至少 n 次，但是不超过 m 次

public class RegTest {
	public static void main(String[] args) {
		
		String testStr = "123";
		boolean matches = Pattern.matches("[123]{3}", testStr); //true
		System.out.println(matches);
		
	}
}

查找子串

查找子串需要使用到Pattern和Mather
[flid=1415279, ffid=BK-2898-20180922-A, frtt=20180922210700, frlt=20180923000300][flid=1417032, ffid=OD-689-20180923-D, fatt=2401, stat=BOR, ista=BOR]

public class RegTest {
	public static final String FFID = "((ffid=){1})\\w{2}-\\w{3,6}-\\d{8}-\\w";
	public static void main(String[] args) {
		String str = "[flid=1415279, ffid=BK-2898-20180922-A, frtt=20180922210700, frlt=20180923000300][flid=1417032, ffid=OD-689-20180923-D, fatt=2401, stat=BOR, ista=BOR]";
		Pattern pattern = Pattern.compile(FFID);
		Matcher matcher = pattern.matcher(str);
		//循环找出全部的匹配子串
		while(matcher.find()) {
			System.out.println(matcher.group(0));
		}
	}
}