Java中的正则表达式

最新推荐文章于 2025-12-08 09:18:56 发布

原创最新推荐文章于 2025-12-08 09:18:56 发布 · 633 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#java #正则表达式

Java 专栏收录该内容

5 篇文章

订阅专栏

正则表达式(regular expression)是指一种操作字符串的搜索模式，可用于文本的搜索、编辑等操作。如下是一个正则表达式：

^[0-9]

其定义了一个搜索非数字字符的模式。

正则表达式规则

字符匹配符

下表是常用的字符匹配符：

表1 字符匹配符

正则表达式	说明
`.`	匹配任意字符
`^regex`	在一行的开始出匹配regex
`regex$`	在一行的末尾匹配regex
`[abc]`	匹配字符a/b/c
`[abc][vz]`	匹配a/b/c后跟v/z的字符串
`[^abc]`	匹配除了a/b/c之外的任意字符
`[a-d]`	匹配a到d之间的字符
`[0-8]`	匹配0到8之间的数字
`XZ`	搜索 XZ
`x\z`	搜索 X 或者 Z
`$`	是否一行的末尾

注意，由于Markdown编辑器缘故，x\z实际是 x|z

元字符

为了简化表达式规则，正则表达式提供了几种元字符(meta characters)：

表2 元字符

正则表达式	说明
`\d`	任意数字,`[0-9]`的简写
`\D`	非数字匹配，`[^0-9]`的简写
`\s`	空白字符，`[ \t\n\x0b\r\f]`的简写
`\S`	非空白字符，`[^\s]`的简写
`\w`	单词匹配符，`[a-zA-Z_0-9]`的简写
`\W`	非单词匹配符，`[^\w]`的简写
`\S+`	多个非空白字符
`\b`	单词`[a-zA-Z0-9_]`边界字符匹配

数量匹配符

另外，正则表达式还提供了数量匹配符(Quantifier),用于标识一个元素出现的频次，主要有以下几种：

表3 数量匹配符

正则表达式	说明	示例
`*`	出现次数 >= 0,等同`{0,}`	`x` 查找零个或者多个字符x; `.`匹配任意字符串
`+`	出现次数 >= 1,等同 `{1,}`	`x+` 匹配出现次数大于1的字符x
`?`	出现次数不多于1次，等同`{0,1}`	`x?` 查找出现次数不大于1次的字符x
`{n}`	出现次数为 n	`\d{3}` 搜索长度为3的数字字符串
`{n1,n2}`	出现次数在 n1 与 n2 之间	`\d{1,4}` 数字字符长度在1 ~ 4之间的字符串
`*?`	`?`放在一个数量匹配符的后面时，定义为一个“懒惰数量匹配符” (reluctant quantifier),该匹配符找到最小的一个匹配，然后搜索到第一个匹配字符时，即不再搜索

给定正则表达式的模式

可以在一个正则表达式的开始给定一个模式修改符(mode modifiers):

(?i) 使正则表达式不区分大小写
(?s) 单行模式,匹配包括换行符在内的所有字符(makes the dot match all characters, including line breaks)
(?m) 多行模式(makes the caret and dollar match at the start and end of each line in the subject string.)

若需要指定多种模式，则将其组合在一起即可: (?ism)

在Java中使用正则表达式

Java中的 String 支持正则表达式来操作字符串，这给文本操作带来了很大的方便：

表4 String中的正则表达式方法

方法	说明
`str.matches("regex")`	判断字符串`str`是否与`regex`相匹配
`str.split("regex")`	通过`regex`分割字符串`str`
`str.replaceFirst("regex","replacement")`	用`replacement`替换字符串中第一次出现 `regex`的字符串
`str.replaceAll("regex","replacement")`	用于`replacement`替换所有匹配`regex`的字符串

参考示例:

    package de.vogella.regex.test;

    public class RegexTestStrings {
        public static final String EXAMPLE_TEST = "This is my small example "
        + "string which I'm going to " + "use for pattern matching.";

        public static void main(String[] args) {
        System.out.println(EXAMPLE_TEST.matches("\\w.*"));
        String[] splitString = (EXAMPLE_TEST.split("\\s+"));
        System.out.println(splitString.length);// should be 14
        for (String string : splitString) {
          System.out.println(string);
        }
        // replace all whitespace with tabs
        System.out.println(EXAMPLE_TEST.replaceAll("\\s+", "\t"));
      }
    }

注意: 在Java中斜杠\是一个转义字符，因此为了得到单个的斜杠字符，需要用\\来实现

Pattern/Matcher

对于更高级的应用，Java提供了两个类Pattern(java.util.regex.Pattern) 和 Matcher (java.util.regex.Matcher):

首先，使用 Pattern 得到对应的正则表达式；
然后，利用 Matcher 来操作相应的字符串

参考示例：

    import java.util.regex.Pattern;
    import java.util.regex.Matcher;

    public class RegularExpression {
        private String regx = null;
        private Pattern pattern = null;
        private Matcher matcher = null;

        public static void main(String[] args){     
            // regular expression test
            String input =  new String("here 2016, now we encounter very confusing things. On the one hand, we human beings feel "
                    + "very confident, but on the other hand, we are so fucking lost in our self-built world! we do waste our energy"
                    + "and time  on useless things. We are totally lost.");

            String regDigit = new String("\\d");
            String regChars = new String("hand|useless");
            String regWild = new String("[^af]");
            String regWord = new String("\\w");
            String regTimes = new String("[a-z]{5}");

            RegularExpression reg = new RegularExpression(regDigit);
            reg.getMatcherResult(input);

            reg.setRegx(regChars);
            reg.getMatcherResult(input);

            reg.setRegx(regWild);
            reg.getMatcherResult(input);

            reg.setRegx(regWord);
            reg.getMatcherResult(input);

            reg.setRegx(regTimes);
            reg.getMatcherResult(input);
        }

        public RegularExpression(){

        }

        public RegularExpression(String reg){
            this.regx = reg;
            this.pattern = Pattern.compile(regx);
        }


        public void getMatcherResult(String in){
            System.out.println("current regression expression is " + regx);
            this.matcher = pattern.matcher(in);

            while(matcher.find()){
                System.out.println(matcher.group());
            }
        }


        public void setRegx(String regx){
            this.regx = regx;
            this.pattern = Pattern.compile(regx);
        }
    }