Perl Tip

perl one line iconv

perl -mEncode -npe 'Encode::from_to($_, "utf-8", "gbk")'

perl -mEncode -npe '$_=Encode::encode("gbk", Encode::decode("utf-8", $_))'

 

------------------------------------------------------------------------------

use Encode;
$_="abc你好wert";
$a=decode('cp936',$_);
($x)=($a=~m/(\p{Han}+)/);
print encode('cp936',$x),"\n";

匹配所有非汉字:\P{Han}
匹配所有汉字: \p{Han}

The Perl FAQ entry How do I strip blank space from the beginning/end of a string? states that using

s/^\s+|\s+$//g;

is slower than doing it in two steps:

s/^\s+//;
s/\s+$//;

Why is this combined statement noticeably slower than the separate ones (for any input string)?

The Perl regex runtime runs much quicker when working with 'fixed' or 'anchored' substrings rather than 'floated' substrings. A substring is fixed when you can lock it to a certain place in the source string. Both '^' and '$' provide that anchoring. However, when you use alternation '|', the compiler doesn't recognize the choices as fixed, so it uses less optimized code to scan the whole string. And at the end of the process, looking for fixed strings twice is much, much faster than looking for a floating string once. On a related note, reading perl's regcomp.c will make you go blind.

Update: Here's some additional details. You can run perl with the '-Dr' flag if you've compiled it with debugging support and it'll dump out regex compilation data. Here's what you get:

~# debugperl -Dr -e 's/^\s+//g' Compiling REx `^\s+'
size 4 Got 36 bytes for offset annotations.
first at 2
synthetic stclass "ANYOF[\11\12\14\15 {unicode_all}]".
   1: BOL(2)
   2: PLUS(4)
   3:   SPACE(0)
   4: END(0)
stclass "ANYOF[\11\12\14\15 {unicode_all}]" anchored(BOL) minlen 1
# debugperl -Dr -e 's/^\s+|\s+$//g' Compiling REx `^\s+|\s+$'
size 9 Got 76 bytes for offset annotations.

   1: BRANCH(5)
   2:   BOL(3)
   3:   PLUS(9)
   4:     SPACE(0)
   5: BRANCH(9)
   6:   PLUS(8)
   7:     SPACE(0)
   8:   EOL(9)
   9: END(0)
minlen 1

Note the word 'anchored' in the first dump.

How do I strip blank space from the beginning/end of a string?

(contributed by brian d foy)

A substitution can do this for you. For a single line, you want to replace all the leading or trailing whitespace with nothing. You can do that with a pair of substitutions:

 s/^\s+//;
s/\s+$//;

You can also write that as a single substitution, although it turns out the combined statement is slower than the separate ones. That might not matter to you, though:

 s/^\s+|\s+$//g;

In this regular expression, the alternation matches either at the beginning or the end of the string since the anchors have a lower precedence than the alternation. With the /g flag, the substitution makes all possible matches, so it gets both. Remember, the trailing newline matches the \s+, and the $ anchor can match to the absolute end of the string, so the newline disappears too. Just add the newline to the output, which has the added benefit of preserving "blank" (consisting entirely of whitespace) lines which the ^\s+ would remove all by itself:

 while( <> ) {
    s/^\s+|\s+$//g;
    print "$_\n"; 
}

For a multi-line string, you can apply the regular expression to each logical line in the string by adding the /m flag (for "multi-line"). With the /m flag, the $ matches before an embedded newline, so it doesn't remove it. This pattern still removes the newline at the end of the string:

 $string =~ s/^\s+|\s+$//gm;

Remember that lines consisting entirely of whitespace will disappear, since the first part of the alternation can match the entire string and replace it with nothing. If you need to keep embedded blank lines, you have to do a little more work. Instead of matching any whitespace (since that includes a newline), just match the other whitespace:

 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;

转载于:https://my.oschina.net/kuerant/blog/115177

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值