Sed

最新推荐文章于 2024-08-27 11:17:51 发布

最新推荐文章于 2024-08-27 11:17:51 发布 · 193 阅读

文章标签：

#UP #F#

Linux 专栏收录该内容

12 篇文章

订阅专栏

Understanding the difference between current-line addressing in ed and global-line addressing in sed is very important. In ed you use addressing to expand the number of lines that are the object of a command; in sed, you use addressing to restrict the number of lines affected by a command.

command [options] script filename

sed -f scrptfile inputfile

$ sed '
> s/ MA/, Massachusetts/
> s/ PA/, Pennsylvania/
> s/ CA/, California/' list

The -n option suppresses the automatic output. When specifying this option, each instruction intended to produce output must contain a print command, p.
sed -n -e 's/MA/Massachusetts/p' list

awk -v var=value 'instruction' inputfile

.
Matches any single character except newline. In awk, dot can match newline also.
*
Matches any number (including zero) of the single character (including a character specified by a regular expression) that immediately precedes it.
[...] Matches any one of the class of characters enclosed between the brackets. A circumflex (^) as first character inside brackets reverses the match to all characters except newline and those listed in the class. In awk, newline will also match. A hyphen (-) is used to indicate a range of characters. The close bracket (]) as the first character in class is a member of the class. All other metacharacters lose their meaning when specified as members of a class.
^
First character of regular expression, matches the beginning of the line. Matches the beginning of a string in awk, even if the string contains embedded newlines.
$
As last character of regular expression, matches the end of the line. Matches the end of a string in awk, even if the string contains embedded newlines.
\{n,m\}
Matches a range of occurrences of the single character (including a character specified by a regular expression) that immediately precedes it. \{n\} will match exactly n occurrences, \{n,\} will match at least n occurrences, and \{n,m\} will match any number of occurrences between n and m. (sed and grep only, may not be in some very old versions.)
\
Escapes the special character that follows

Extended Metacharacters (egrep and awk):
+
Matches one or more occurrences of the preceding regular expression.
?
Matches zero or one occurrences of the preceding regular expression.
|
Specifies that either the preceding or following regular expression can be matched (alternation).
()
Groups regular expressions.
{n,m}
Matches a range of occurrences of the single character (including a character specified by a regular expression) that immediately precedes it. {n} will match exactly n occurrences, {n,} will match at least n occurrences, and {n,m} will match any number of occurrences between n and m. (POSIX egrep and POSIX awk, not in traditional egrep or awk.)

Inside square brackets, the standard metacharacters lose their meaning.

Special Characters in Character Classes
\ Escapes any special character (awk only)
- Indicates a range when not in the first or last position.
^ Indicates a reverse match only when in the first position.

The close bracket (]) is interpreted as a member of the class if it occurs as the first character in the class (or as the first character after a circumflex). The hyphen loses its special meaning within a class if it is the first or last character.
In awk, you could also use the backslash to escape the hyphen or close bracket wherever either one occurs in the range, but the syntax is messier.

Basic Regular Expressions (BREs), which are the kind used by grep and sed, and Extended Regular Expressions, which are the kind used by egrep and awk.

Character classes. A POSIX character class consists of keywords bracketed by [: and :]. The keywords describe different classes of characters such as alphabetic characters, control characters, and so on (see Table 3.3).
[:alnum:] Printable characters (includes whitespace)
[:alpha:] Alphabetic characters
[:blank:] Space and tab characters
[:cntrl:] Control characters
[:digit:] Numeric characters
[:graph:] Printable and visible (non-space) characters
[:lower:] Lowercase characters
[:print:] Alphanumeric characters
[:punct:] Punctuation characters
[:space:] Whitespace characters
[:upper:] Uppercase characters
[:xdigit:] Hexadecimal digits

Collating symbols. A collating symbol is a multicharacter sequence that should be treated as a unit. It consists of the characters bracketed by [. and .].

Equivalence classes. An equivalence class lists a set of characters that should be considered equivalent, such as e and è. It consists of a named element from the locale, bracketed by [= and =].

The vertical bar (|) metacharacter, part of the extended set of metacharacters, allows you to specify a union of regular expressions.
compan(y|ies)

$ egrep "(^| )[\"[{(]*book[]})\"?\!.,;:'s]*( |$)" bookwords
This file tests for book in various places, such as
book at the beginning of a line or
at the end of a line book
as well as the plural books and
"book of the year award"
to look for a line with the word "book"
A GREAT book!
A great book? No.
told them about (the books) until it
Here are the books that you requested
Yes, it is a good book for children
amazing that it was called a "harmful book" when
once you get to the end of the book, you can't believe

a special metacharacter for matching a string at the beginning of a word, \<, and one for matching a string at the end of a word, \>. Used as a pair, they can match a string only when it is a complete word.

$ gres '"[^"]*"' '00' sampleLine
.Se 00 "Appendix"

1........5
5........10
10.......20
100......200
$ sed 's/$[0-9][0-9]*$\.\{5,\}$[0-9][0-9]*$/\1-\2/' sample
1-5
5-10
10-20
100-200

his mistake is simply a problem of the order of the commands in the script.

Sed also maintains a second temporary buffer called the hold space. You can copy the contents of the pattern space to the hold space and retrieve them later.

A sed command can specify zero, one, or two addresses. An address can be a regular expression describing a pattern, a line number, or a line addressing symbol.
If no address is specified, then the command is applied to each line.
If there is only one address, the command is applied to any line matching the address.
If two comma-separated addresses are specified, the command is performed on the first line matching the first address and all succeeding lines up to and including a line matching the second address.
If an address is followed by an exclamation mark (!), the command is applied to all lines that do not match the address.

The line number refers to an internal line count maintained by sed. This counter is not reset for multiple input files.
Similarly, the input stream has only one last line. It can be specified using the addressing symbol $.
eg.
d
1d
$d
/^$/d
/^\.TS/,/^\.TE/d
50,$d
1,/^$/d #This example deletes from the first line up to the first blank line.
An exclamation mark (!) following an address reverses the sense of the match. For instance, the following script deletes all lines except those inside tbl input:
/^\.TS/,/^\.TE/!d

Braces ({}) are used in sed to nest one address inside another or to apply multiple commands at the same address.
/^\.TS/,/^\.TE/{
    /^$/d #to delete blank lines only inside blocks of tbl input
    s/^\.ps 10/.ps 8/
    s/^\.vs 12/.vs 10/
}

/---/!s/--/\\(em/g
If you find a line containing three consecutive hyphens, don't apply the edit. On all other lines, the substitute command will be applied.

Substitution
[address]s/pattern/replacement/flags
n A number(1 to 512) indicating that a replacement should be made for only the nth occurrence of the pattern.
g Make changes globally on all occurrences in the pattern space. Normally only the first occurrence is replaced.
p Print the contents of the pattern space.
w file
Write the contents of the pattern space to file.
The substitute command is applied to the lines matching the address. If no address is specified, it is applied to all lines that match the pattern. If a regular expression is supplied as an address, and no pattern is specifed, the substitute command matches what is matched by the address.

In the replacement section, only the following characters have special meaning:
& Replaced by the string matched by the regular expression.
\n Matches the nth substring previously specified in the pattern using "$" and "$".
\ Used to escape the ampersand, the blackslash, and the substitution command's delimiter. In addtion, it can be used to escape the newline and create a multiline replacement string.

#! /bin/sh
grep "^\.XX" $* | sort -u |
sed '
s/^\.XX $.*$$/\/^\\.XX \/s\/\1\/\1\//'

Delete
The delete command is also a command that can change the flow of control in a script. That is because once it is executed, no further commands are executed on the "empty" pattern space.

Append, Insert, and Change
[line-address]a\
text

[line-address]i\
text

[address]c\
text

The insert command places the supplied text before the current line in the pattern space.
The append command places it after the current line.
The change command replaces the content of the pattern space with the supplied text.
The text must begins on the next line. To input multiple lines of text, each successive line must end with a backslash, with the exception of the very last line.
E.g,
/<Larry's Address>/i\
4600 Cross Court \
French Lick, IN
The append and insert commands can be applied only a single line address, not a range of lines. The change command, however, can address a range of lines. In this case, it replaces all addressed lines with a single copy of the text. In other words, it deletes each line in the range but the supplid text is output only once.
E.g,
/^From /,/^$/c\
<Mail Header Removed>

The insert and append commands do not affect the contents of the pattern space. The supplied text will not match any address in subsequent commands in the script, nor can thse commands affect the text(different with s command). No matter what changes occur to alter the pattern space, the supplied text will still be output appropriately. Also, the supplied text does not affect sed's internal line counter(nor do s,d commands).

#cat data
line1

#sed '1{
i\
before line1
a\
after line1
s/line1/subline1\nsubline2/g
s/line/ /g}' data

before line1
sub 1
sub 2
after line1

Print line number
[address]=

The next command (n) outputs the contents of the pattern space and then reads the next line of input without returning to the top of the script.
[address]n
E.g,
/^\.H1/{
n
/^$/d
}
Match any line beginning with the string '.H1', then print that line and read in the next line. If that line is blank, delete it.

The quit command (q) causes sed to stop reading new input lines (and stop sending them to the output). (timesaver)
[address]q