1. work on $_ by default
i.e. it will work on $_ if we don't give any vars;
while (<>)
{
print if m/leo/;
print s/leo/LEO/; # print number of matches found and replaced
print $_;
}
[root@sxvvr10 lzc]# ./sysadm.pl
leoisOK
leoisOK
1LEOisOK
2. binding operators
=~: match is true
!~ : no match is true
3. modifiers
/i --- case insensitive
/g --- global matches
muti-line modifiers:
Mode Specified with ^ matches... $ matches... Dot matches newline
default neither /s nor /m start of string end of string No
single-line /s start of string end of string Yes
multi-line /m start of line end of line No
multi-line /m and /s start of line end of line Yes
my $_ = "This is some text
and some more text spanning
several lines";
print if /^and.*spanning$/; # no match
print if /^and.*spanning$/m; # match
print if /^and.*lines$/m; # no match coz . can not replace /n
print if /^and.*lines$/ms; # match
4. meta chars
/w --- [0-9a-zA-Z_]
/W --- chars other than /w.
/b --- match from /w to /W, or from /W to /w, zero length.
/B --- match except /b.
e.g. //bword/B/, match $word2, $ is /W, w is /w, /b can matches the imaginary characters between $ and w, d and 2 are /w, so /B matches imaginary characters between d and 2.
5. quantifier
?
*
+
{n}
{n,}
{n,m}
6. character classes
[a-zA-Z0-9_]: only can match one character in the class.
7. alternation
match any of a set of longer strings.
print if /one|two|three/; # match xxonexx or xxtwoxx or xxthreexx
print if /^one|two|three$/; # match ^onexx or xxxtwoxxx or xxxthree$
print if /^(one|two|three)$/; # match exactly one or two or three
8. atomic groupings -- round brackets
//b([^/W/d_aeiou][aeiou]){2}/b/ # match a consonant followed by a vowel twice in a row, e.g. tutu or tofu
This can capture matched strings to scalars($1, $2...), using "(?:) instead can avoid this.
my $str = "It is perl I like most";
if ($str =~ m/(?:perl.*most)/)
{
print "matched! $1/n"; # Use of uninitialized value $1 in concatenation (.) or string at ./sysadm.pl line 32.
} # matched!
9. Capture matched strings to scalars.
$1, $2, $3 ... the matched text by 1st, 2nd, 3rd... sets of parenthesis, can be used in substitutions "s/(...)/$1/", but can not be used in the current match pattern "m/(...) $1/", we can use /1, /2 ... instead.
$&: matched text (MATCH)
$`: unmatched text to the left of matched one(PREMATCH).
$' : unmatched text to the right of matched one(POSTMATCH);
NOTE: The use of above 3 special vars can low down your program. So, try to use $1, $2... instead.
my $a1 = "0123456789";
my $a2 = "finish";
$a2 =~ /((/w)(/w))/;
print $1, " ", $2, " ", $3, "/n"; # fi f i
$a1 =~ /(/d+)/;
print $1, "/n"; # capture all 0123456789
$a1 =~ /(/d)+/;
print $1, "/n"; # match any digit in $a1, but store the last digit into $1, so printing 9
10. extended regular expression --- /x switch
#parse a line from `ls -l`
[root@sxvvr10 lzc]# ls -l
total 1255098
-rwxr-xr-x 1 root root 1320 Dec 11 15:46 sysadm.pl
-rwxr-xr-x 1 root root 78 Nov 27 17:17 test.pl
m/
^ # Start of line.
([/w-]+)/s+ # $1 - File permissions.
(/d+)/s+ # $2 - Hard links.
(/w+)/s+ # $3 - User
(/w+)/s+ # $4 - Group
(/d+)/s+ # $5 - File size
(/w+/s+/d+/s+[/d:]+)/s+ # $6 - Date and time.
(.*) # $7 - Filename.
$ # End of line.
/x;
11. greediness --- max matching, regular expression try to match the biggest thing it possibly can.
$_ = "abracadabra";
/(a.*a)/ # greedy -- $1 = "abracadabra"
/(a.*?a)/ # not greedy -- $1 = "abra"