perl: regular expression

本文介绍了Perl正则表达式的用法,包括默认操作符`$_`、匹配与不匹配操作符`=~`和`!~`、修饰符如`/i`和`/g`、元字符、量词、字符类、选择匹配、原子组、捕获匹配到的字符串以及延长正则表达式等概念,并给出了多个示例。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1. work on $_ by default

i.e. it will work on $_ if we don't give any vars;

while (<>)
{
                print if m/leo/;
                print s/leo/LEO/;  # print number of matches found and replaced
                print $_;
}

 

[root@sxvvr10 lzc]# ./sysadm.pl
leoisOK
leoisOK
1LEOisOK

 

2. binding operators

=~: match is true

!~ : no match is true

 

3.  modifiers

/i  --- case insensitive

/g --- global matches

 

muti-line modifiers:

Mode          Specified with       ^ matches...         $ matches...          Dot matches newline
default      neither /s nor /m   start of string       end of string               No
single-line             /s               start of string       end of string               Yes
multi-line              /m               start of line          end of line                   No
multi-line        /m and /s          start of line          end of line                   Yes

 

my $_ = "This is some text
and some more text spanning
several lines";
print if /^and.*spanning$/;     # no match
print if /^and.*spanning$/m;  # match
print if /^and.*lines$/m;          # no match coz . can not replace /n
print if /^and.*lines$/ms;        # match

 

4. meta chars

/w --- [0-9a-zA-Z_]

/W --- chars other than /w.

/b  ---  match from /w to /W, or from /W to /w, zero length.

/B  ---  match except /b.

e.g.  //bword/B/, match $word2, $ is /W, w is /w, /b can matches the imaginary characters between $ and w, d and 2 are /w, so /B matches imaginary characters between d and 2.

 

5. quantifier

?

*

+

{n}

{n,}

{n,m}

 

6. character classes

[a-zA-Z0-9_]: only can match one character in the class.

 

7. alternation

match any of a set of longer strings.

print if /one|two|three/;        # match xxonexx or xxtwoxx or xxthreexx
print if /^one|two|three$/;   # match ^onexx or xxxtwoxxx or xxxthree$
print if /^(one|two|three)$/; # match exactly one or two or three

 

8. atomic groupings -- round brackets

//b([^/W/d_aeiou][aeiou]){2}/b/ # match a consonant followed by a vowel twice in a row, e.g. tutu or tofu

 

This can capture matched strings to scalars($1, $2...),  using "(?:) instead can avoid this.

 

 my $str = "It is perl I like most";
 if ($str =~ m/(?:perl.*most)/)
{
     print "matched! $1/n";   # Use of uninitialized value $1 in concatenation (.) or string at ./sysadm.pl line 32.
 }                                         # matched!

 

9. Capture matched strings to scalars.

$1, $2, $3 ... the matched text by 1st, 2nd, 3rd... sets of parenthesis, can be used in substitutions "s/(...)/$1/", but can not be used in the current match pattern "m/(...) $1/", we can use /1, /2 ... instead.

 

$&: matched text (MATCH)

$`: unmatched text to the left of matched one(PREMATCH).

$' : unmatched text to the right of matched one(POSTMATCH);

NOTE: The use of above 3 special vars can low down your program. So, try to use $1, $2... instead.

 

        my $a1 = "0123456789";
        my $a2 = "finish";

        $a2 =~ /((/w)(/w))/;
        print $1, " ", $2, " ", $3, "/n";   # fi f i
        $a1 =~ /(/d+)/;
        print $1, "/n";                           # capture all 0123456789
        $a1 =~ /(/d)+/;
        print $1, "/n";                           # match any digit in $a1, but store the last digit into $1, so printing 9

 

10. extended regular expression --- /x switch

#parse a line from `ls -l`

[root@sxvvr10 lzc]# ls -l
total 1255098
-rwxr-xr-x   1 root     root        1320 Dec 11 15:46 sysadm.pl
-rwxr-xr-x   1 root     root          78 Nov 27 17:17 test.pl

 

m/
^                # Start of line.
([/w-]+)/s+ # $1 - File permissions.
(/d+)/s+      # $2 - Hard links.
(/w+)/s+     # $3 - User
(/w+)/s+     # $4 - Group
(/d+)/s+      # $5 - File size
(/w+/s+/d+/s+[/d:]+)/s+       # $6 - Date and time.
(.*) #           $7 - Filename.
$                  # End of line.
/x;

 

11. greediness --- max matching, regular expression try to match the biggest thing it possibly can.

$_ = "abracadabra";
/(a.*a)/       # greedy -- $1 = "abracadabra"
/(a.*?a)/     # not greedy -- $1 = "abra"

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值