regular expression

本文介绍了Perl中的正则表达式基础知识,包括搜索模式、替换模式及操作符,并通过实例展示了如何进行文本匹配与替换。文章还讲解了如何利用Perl命令行格式执行正则表达式的匹配和替换操作。

 

今天学习了perl的正则表达式,它重要吗?已经很久没有用过了,但是需要的时候就急了,因为每次用过就仍了,没有好好的理解过。
regular expression' world is full of interesting. believe that and study it in the core.

regular expression帮助我们match、replace文本。

我将regular expression 分为三部分,search patterns,replacement patterns,opreator

 

1 search patterns告诉我们怎么去match

1).*^${n,m}()[]

2)/w/W/s/S/d/D

 

2 replacement patterns告诉我们将匹配的内容做如何处理

 

 

3 opreator其实与regular expression无关,他与perl相关。

m// (Matching)

s/// (Substitution)

split

另外perl常用的是基于命令行格式,如perl -ne ‘s///ig’ file.txt

n代表在while(<>)循环

e代表命令行格式

 

 

 

1.3.2.1 Search patterns

The characters in the following table have special meaning only in search patterns:

Character

Pattern

.

Match any single character except newline. Can match newline in awk.

*

Match any number (or none) of the single character that immediately precedes it. The preceding character can also be a regular expression. For example, since . (dot) means any character, .* means "match any number of any character."

^

Match the following regular expression at the beginning of the line or string.

$

Match the preceding regular expression at the end of the line or string.

/

Turn off the special meaning of the following character.

[ ]

Match any one of the enclosed characters. A hyphen (-) indicates a range of consecutive characters. A circumflex (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list (i.e., literally).

{n,m}

Match a range of occurrences of the single character that immediately precedes it. The preceding character can also be a metacharacter. {n} matches exactly n occurrences; {n,} matches at least n occurrences; and {n,m} matches any number of occurrences between n and m. n and m must be between 0 and 255, inclusive.

/{n,m/}

Just like {n,m}, but with backslashes in front of the braces.

/( /)

Save the pattern enclosed between /( and /) into a special holding space. Up to nine patterns can be saved on a single line. The text matched by the subpatterns can be "replayed" in substitutions by the escape sequences /1 to /9.

/n

Replay the nth sub-pattern enclosed in /( and /) into the pattern at this point. n is a number from 1 to 9, with 1 starting on the left.

在perl regularEX中在()中的内容可以是一个group,也可以将其中匹配内容保存在一个变量中,变量名可以在serach pattern中使用,也可以在replacement pattern中使用。

/< />

Match characters at beginning (/<) or end (/>) of a word.

+

Match one or more instances of preceding regular expression.

?

Match zero or one instances of preceding regular expression.

|

Match the regular expression specified before or after.

( )

Apply a match to the enclosed group of regular expressions.

/w

Word character

/W

Non-word character

/d

Digit character

/D

Non-digit character

/s

Whitespace character

/S

Non-whitespace character

 

 

 

1.3.2.2 Replacement patterns

The characters in the following table have special meaning only in replacement patterns:

Character

Pattern

/

Turn off the special meaning of the following character.

/n

Restore the text matched by the nth pattern previously saved by /( and /). n is a number from 1 to 9, with 1 starting on the left.

&

Reuse the text matched by the search pattern as part of the replacement pattern.

~

Reuse the previous replacement pattern in the current replacement pattern. Must be the only character in the replacement pattern (ex and vi).

%

Reuse the previous replacement pattern in the current replacement pattern. Must be the only character in the replacement pattern (ed).

/u

Convert first character of replacement pattern to uppercase.

/U

Convert entire replacement pattern to uppercase.

/l

Convert first character of replacement pattern to lowercase.

/L

Convert entire replacement pattern to lowercase.

/E

Turn off previous /U or /L.

/e

Turn off previous /u or /l.

 

 

 

 

1.3.2 Regular Expression Operators

Perl provides the built-in regular expression operators qr//, m//, and s///, as well as the split function. Each operator accepts a regular expression pattern string that is run through string and variable interpolation and then compiled.

Regular expressions are often delimited with the forward slash, but you can pick any non-alphanumeric, non-whitespace character. Here are some examples:

qr#...#       m!...!        m{...}
s|...|...|    s[...][...]   s<...>/.../

A match delimited by slashes (/.../) doesn't require a leading m:

/.../      #same as m/.../

Using the single quote as a delimiter suppresses interpolation of variables and the constructs /N{name}, /u, /l, /U, /L, /Q, /E. Normally these are interpolated before being passed to the regular expression engine.

qr// (Quote Regex)


qr/PATTERN/ismxo

Quote and compile PATTERN as a regular expression. The returned value may be used in a later pattern match or substitution. This saves time if the regular expression is going to be repeatedly interpolated. The match modes (or lack of), /ismxo, are locked in.

m// (Matching)


m/PATTERN/imsxocg

Match PATTERN against input string. In list context, returns a list of substrings matched by capturing parentheses, or else (1) for a successful match or ( ) for a failed match. In scalar context, returns 1 for success or "" for failure. /imsxo are optional mode modifiers. /cg are optional match modifiers. /g in scalar context causes the match to start from the end of the previous match. In list context, a /g match returns all matches or all captured substrings from all matches. A failed /g match will reset the match start to the beginning of the string unless the match is in combined /cg mode.

s/// (Substitution)


s/PATTERN/REPLACEMENT/egimosx

Match PATTERN in the input string and replace the match text with REPLACEMENT, returning the number of successes. /imosx are optional mode modifiers. /g substitutes all occurrences of PATTERN. Each /e causes an evaluation of REPLACEMENT as Perl code.

split


split /PATTERN/, EXPR, LIMIT
split /PATTERN/, EXPR
split /PATTERN/
split

Return a list of substrings surrounding matches of PATTERN in EXPR. If LIMIT, the list contains substrings surrounding the first LIMIT matches. The pattern argument is a match operator, so use m if you want alternate delimiters (e.g., split m{PATTERN}). The match permits the same modifiers as m{}. Table 1-8 lists the after-match variables.

 

 

1.3.4 Examples
Example 1-1. Simple match
# Match Spider-Man, Spiderman, SPIDER-MAN, etc.
my $dailybugle = "Spider-Man Menaces City!";
if ($dailybugle =~ m/spider[- ]?man/i) { do_something(  ); }
Example 1-2. Match, capture group, and qr
# Match dates formatted like MM/DD/YYYY, MM-DD-YY,...
my $date  = "12/30/1969";
my $regex = qr!(/d/d)[-/](/d/d)[-/](/d/d(?:/d/d)?)!;
if ($date =~ m/$regex/) {
  print "Day=  ", $1,
        "Month=", $2,
        "Year= ", $3;
}
Example 1-3. Simple substitution
# Convert <br> to <br /> for XHTML compliance
my $text = "Hello World! <br>";
$text =~ s#<br>#<br />#ig;
Example 1-4. Harder substitution
# urlify - turn URL's into HTML links
$text = "Check the website, http://www.oreilly.com/catalog/repr.";
$text =~ 
    s{
      /b                         # start at word boundary
      (                          # capture to $1
       (https?|telnet|gopher|file|wais|ftp) : 
                                 # resource and colon
       [/w/#~:.?+=&%@!/-] +?     # one or more valid
                                 # characters 
                                 # but take as little as
                                 # possible
      )
      (?=                        # lookahead   
        [.:?/-] *                #  for possible punctuation
        (?: [^/w/#~:.?+=&%@!/-]  #  invalid character
          | $ )                  #  or end of string
      )
     }{<a href="$1">$1</a>}igox;
 
Any word (a word is defined as a sequence of alphanumerics - no whitespace) that contains a
 double letter, for example "book" has a double "o" and "feed" has a double "e". 
 
/([a-zA-Z])/1/注意这里的/1,他代表了前面()中匹配的内容  

 

出现 `Invalid regular expression flags` 错误通常是因为你**在 JavaScript 中使用了非法的正则表达式标志(flags)**,或者在使用正则表达式时格式不正确。 --- ## 🔍 错误示例 以下是一些会导致该错误的常见代码: ### 示例 1:错误地使用了非法标志 ```javascript const regex = new RegExp('abc', 'x'); // 报错:Invalid regular expression flags ``` 解释:`x` 不是一个合法的正则标志,JavaScript 只接受 `g`、`i`、`m`、`s`、`u`、`y` 作为标志。 --- ### 示例 2:字符串中包含非法标志 ```javascript const regex = /abc/x; // 报错:Invalid regular expression flags ``` 同样,`x` 是非法的标志。 --- ## ✅ 合法的正则表达式标志(flags) | 标志 | 含义 | |------|------| | `g` | 全局匹配(global) | | `i` | 忽略大小写(case-insensitive) | | `m` | 多行匹配(multiline) | | `s` | 使 `.` 匹配包括换行符在内的所有字符(dotAll) | | `u` | 启用 Unicode 模式 | | `y` | 粘性匹配(sticky),只从 `lastIndex` 开始匹配 | --- ## ✅ 正确用法示例 ### 使用合法标志 ```javascript const regex1 = /hello/i; const regex2 = new RegExp('hello', 'gi'); ``` ### 动态构造正则表达式 ```javascript function createRegex(pattern, flags) { try { return new RegExp(pattern, flags); } catch (e) { console.error('正则表达式错误:', e.message); } } const regex = createRegex('test', 'gim'); ``` --- ## 🧪 常见错误场景及修复方法 ### 场景 1:用户输入标志导致错误 ```javascript let userInput = 'abc'; let userFlags = 'xyz'; // 用户输入了非法标志 const regex = new RegExp(userInput, userFlags); // 报错 ``` ✅ 修复方法:校验标志是否合法 ```javascript function isValidFlags(flags) { return /^[gimsuy]*$/.test(flags); } if (isValidFlags(userFlags)) { const regex = new RegExp(userInput, userFlags); } else { console.error('非法的正则标志:', userFlags); } ``` --- ## 📌 注意事项 - 在正则字面量中,标志必须写在 `/.../` 之后,如 `/abc/g`。 - 使用 `new RegExp()` 构造函数时,第二个参数是可选的,但必须是合法标志字符串。 - 避免将标志写入正则表达式主体中,如 `/abc/gim` 是对的,但 `/abc/g+i+m` 是错的。 --- ## ✅ 总结 | 问题 | 原因 | 解决方案 | |------|------|-----------| | `Invalid regular expression flags` | 使用了非法的正则标志(如 `x`、`z` 等) | 只使用合法标志:`g`, `i`, `m`, `s`, `u`, `y` | | 构造正则时标志来源不可控 | 用户输入或变量中包含非法字符 | 添加正则标志校验逻辑 | ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值