perl: regular expression

最新推荐文章于 2025-02-09 08:06:40 发布

原创最新推荐文章于 2025-02-09 08:06:40 发布 · 746 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#perl #concatenation #character #string #printing #binding

Perl 专栏收录该内容

13 篇文章

订阅专栏

本文介绍了Perl正则表达式的用法，包括默认操作符`$_`、匹配与不匹配操作符`=~`和`!~`、修饰符如`/i`和`/g`、元字符、量词、字符类、选择匹配、原子组、捕获匹配到的字符串以及延长正则表达式等概念，并给出了多个示例。

1. work on $_ by default

i.e. it will work on $_ if we don't give any vars;

while (<>)
{
                print if m/leo/;
                print s/leo/LEO/; # print number of matches found and replaced
                print $_;
}

[root@sxvvr10 lzc]# ./sysadm.pl
leoisOK
leoisOK
1LEOisOK

2. binding operators

=~: match is true

!~ : no match is true

3. modifiers

/i --- case insensitive

/g --- global matches

muti-line modifiers:

Mode          Specified with       ^ matches...         $ matches...          Dot matches newline
default      neither /s nor /m start of string       end of string               No
single-line             /s               start of string       end of string               Yes
multi-line              /m               start of line          end of line                   No
multi-line    /m and /s          start of line          end of line                   Yes

my $_ = "This is some text
and some more text spanning
several lines";
print if /^and.*spanning$/;     # no match
print if /^and.*spanning$/m; # match
print if /^and.*lines$/m;          # no match coz . can not replace /n
print if /^and.*lines$/ms;        # match

4. meta chars

/w --- [0-9a-zA-Z_]

/W --- chars other than /w.

/b --- match from /w to /W, or from /W to /w, zero length.

/B --- match except /b.

e.g. //bword/B/, match $word2, $ is /W, w is /w, /b can matches the imaginary characters between $ and w, d and 2 are /w, so /B matches imaginary characters between d and 2.

5. quantifier

{n}

{n,}

{n,m}

6. character classes

[a-zA-Z0-9_]: only can match one character in the class.

7. alternation

match any of a set of longer strings.

print if /one|two|three/; # match xxonexx or xxtwoxx or xxthreexx
print if /^one|two|three$/; # match ^onexx or xxxtwoxxx or xxxthree$
print if /^(one|two|three)$/; # match exactly one or two or three

8. atomic groupings -- round brackets

//b([^/W/d_aeiou][aeiou]){2}/b/ # match a consonant followed by a vowel twice in a row, e.g. tutu or tofu

This can capture matched strings to scalars($1, $2...), using "(?:) instead can avoid this.

my $str = "It is perl I like most";
if ($str =~ m/(?:perl.*most)/)
{
print "matched! $1/n"; # Use of uninitialized value $1 in concatenation (.) or string at ./sysadm.pl line 32.
} # matched!

9. Capture matched strings to scalars.

$1, $2, $3 ... the matched text by 1st, 2nd, 3rd... sets of parenthesis, can be used in substitutions "s/(...)/$1/", but can not be used in the current match pattern "m/(...) $1/", we can use /1, /2 ... instead.

$&: matched text (MATCH)

$`: unmatched text to the left of matched one(PREMATCH).

$' : unmatched text to the right of matched one(POSTMATCH);

NOTE: The use of above 3 special vars can low down your program. So, try to use $1, $2... instead.

my $a1 = "0123456789";
        my $a2 = "finish";

        $a2 =~ /((/w)(/w))/;
        print $1, " ", $2, " ", $3, "/n";   # fi f i
        $a1 =~ /(/d+)/;
        print $1, "/n";                           # capture all 0123456789
        $a1 =~ /(/d)+/;
        print $1, "/n";                           # match any digit in $a1, but store the last digit into $1, so printing 9

10. extended regular expression --- /x switch

#parse a line from `ls -l`

[root@sxvvr10 lzc]# ls -l
total 1255098
-rwxr-xr-x 1 root root 1320 Dec 11 15:46 sysadm.pl
-rwxr-xr-x 1 root root 78 Nov 27 17:17 test.pl

m/
^                # Start of line.
([/w-]+)/s+ # $1 - File permissions.
(/d+)/s+      # $2 - Hard links.
(/w+)/s+     # $3 - User
(/w+)/s+     # $4 - Group
(/d+)/s+      # $5 - File size
(/w+/s+/d+/s+[/d:]+)/s+       # $6 - Date and time.
(.*) #           $7 - Filename.
$                  # End of line.
/x;

11. greediness --- max matching, regular expression try to match the biggest thing it possibly can.

$_ = "abracadabra";
/(a.*a)/ # greedy -- $1 = "abracadabra"
/(a.*?a)/ # not greedy -- $1 = "abra"