jakarta regexp (java struts正则表达式)

最新推荐文章于 2014-06-04 20:59:12 发布

转载最新推荐文章于 2014-06-04 20:59:12 发布 · 678 阅读

文章标签：

#正则表达式 #struts #java #character #closures #newline

ssh 专栏收录该内容

8 篇文章

订阅专栏

本文介绍了正则表达式的各种元素，包括字符类、预定义类、边界匹配器等，并详细解释了贪婪与不情愿闭包的区别。此外还列举了逻辑运算符及反向引用等内容。

   Character Classes

      [abc]                  Simple character class
      [a-zA-Z]               Character class with ranges
      [^abc]                 Negated character class
NOTE: Incomplete ranges will be interpreted as "starts from zero" or "ends with last character".
I.e. [-a] is the same as [//u0000-a], and [a-] is the same as [a-//uFFFF], [-] means "all characters".

  Standard POSIX Character Classes

      [:alnum:]              Alphanumeric characters.
      [:alpha:]              Alphabetic characters.
      [:blank:]              Space and tab characters.
      [:cntrl:]              Control characters.
      [:digit:]              Numeric characters.
      [:graph:]              Characters that are printable and are also visible.
                           (A space is printable, but not visible, while an
                           `a' is both.)
      [:lower:]              Lower-case alphabetic characters.
      [:print:]              Printable characters (characters that are not
                           control characters.)
      [:punct:]              Punctuation characters (characters that are not letter,
                           digits, control characters, or space characters).
      [:space:]              Space characters (such as space, tab, and formfeed,
                           to name a few).
      [:upper:]              Upper-case alphabetic characters.
      [:xdigit:]             Characters that are hexadecimal digits.

  Non-standard POSIX-style Character Classes

      [:javastart:]          Start of a Java identifier
      [:javapart:]           Part of a Java identifier

  Predefined Classes

      .           Matches any character other than newline
      /w          Matches a "word" character (alphanumeric plus "_")
      /W          Matches a non-word character
      /s          Matches a whitespace character
      /S          Matches a non-whitespace character
      /d          Matches a digit character
      /D          Matches a non-digit character

  Boundary Matchers

      ^           Matches only at the beginning of a line
      $           Matches only at the end of a line
      /b          Matches only at a word boundary
      /B          Matches only at a non-word boundary

  Greedy Closures

      A*          Matches A 0 or more times (greedy)
      A+          Matches A 1 or more times (greedy)
      A?          Matches A 1 or 0 times (greedy)
      A{n}        Matches A exactly n times (greedy)
      A{n,}       Matches A at least n times (greedy)
      A{n,m}      Matches A at least n but not more than m times (greedy)

  Reluctant Closures

      A*?         Matches A 0 or more times (reluctant)
      A+?         Matches A 1 or more times (reluctant)
      A??         Matches A 0 or 1 times (reluctant)

  Logical Operators

      AB          Matches A followed by B
      A|B         Matches either A or B
      (A)         Used for subexpression grouping
     (?:A)        Used for subexpression clustering (just like grouping but
                no backrefs)

  Backreferences

      /1      Backreference to 1st parenthesized subexpression
      /2      Backreference to 2nd parenthesized subexpression
      /3      Backreference to 3rd parenthesized subexpression
      /4      Backreference to 4th parenthesized subexpression
      /5      Backreference to 5th parenthesized subexpression
      /6      Backreference to 6th parenthesized subexpression
      /7      Backreference to 7th parenthesized subexpression
      /8      Backreference to 8th parenthesized subexpression
      /9      Backreference to 9th parenthesized subexpression

All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a '?'. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don't currently support reluctancy.

Line terminators
A line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators:

A newline (line feed) character ('/n'),
A carriage-return character followed immediately by a newline character ("/r/n"),
A standalone carriage-return character ('/r'),
A next-line character ('?'),
A line-separator character ('?'), or
A paragraph-separator character ('?).