Shell参考 - grep命令

原创已于 2025-01-10 16:56:16 修改 · 440 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#笔记

于 2024-10-02 22:26:22 首次发布

GNU/Linux 专栏收录该内容

197 篇文章

订阅专栏

Manual page in Ubuntu: man grep

Grep, short for “global regular expression print”, is a command used for searching and matching text patterns in files contained in the regular expressions.

The `grep` command is used in Unix-like operating systems to search for patterns within files. It's a powerful tool for text processing and finding specific strings or regular expressions within file contents.

Name

grep, egrep, fgrep, rgrep - print lines that match patterns

SYNOPSIS

You can use `grep` to search for patterns within a file by following this syntax:

grep [OPTION...] PATTERNS [FILE...]
grep [OPTION...] -e PATTERNS ... [FILE...]
grep [OPTION...] -f PATTERN_FILE ... [FILE...]

DESCRIPTION

grep searches for PATTERNS in each FILE. PATTERNS is one or more patterns separated by newline characters, and grep prints each line that matches a pattern. Typically PATTERNS should be quoted when grep is used in a shell command.

A FILE of “-” stands for standard input. If no FILE is given, recursive searches examine the working directory, and nonrecursive searches read standard input.

Debian also includes the variant programs egrep, fgrep and rgrep. These programs are the same as grep -E, grep -F, and grep -r, respectively. These variants are deprecated upstream, but Debian provides for backward compatibility. For portability reasons, it is recommended to avoid the variant programs, and use grep with the related option instead.

OPTIONS

1, Generic Program Information
--help

Output a usage message and exit.

-V, --version
Output the version number of grep and exit.

2, Pattern Syntax

-E, --extended-regexp
Interpret PATTERNS as extended regular expressions (EREs, see below).

-F, --fixed-strings
Interpret PATTERNS as fixed strings, not regular expressions.

-G, --basic-regexp
Interpret PATTERNS as basic regular expressions (BREs, see below). This is the default.

-P, --perl-regexp
Interpret PATTERNS as Perl-compatible regular expressions (PCREs). This option is experimental when combined with the -z (--null-data) option, and grep -P may warn of unimplemented features.

3, Matching Control

-e PATTERNS, --regexp=PATTERNS
Use PATTERNS as the patterns. If this option is used multiple times or is combined with the -f
(--file) option, search for all patterns given. This option can be used to protect a pattern beginning
with “-”.

-f FILE, --file=FILE
Obtain patterns from FILE, one per line. If this option is used multiple times or is combined with the -e (--regexp) option, search for all patterns given. The empty file contains zero patterns, and therefore matches nothing. If FILE is - , read patterns from standard input.

-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore. This option has no effect if -x is also specified.

-x, --line-regexp
Select only those matches that exactly match the whole line. For a regular expression pattern, this is like parenthesizing the pattern and then surrounding it with ^ and $.

Common Options:

`-c`: Counts the number of matches.
`-i`: Ignore case (e.g., `grep -i "apple" fruits.txt` will match "Apple", "APPLE", etc.)
`-r`: Search recursively in directories (e.g., `grep -r "apple" /path/to/directory/`)
`-n`: Show line numbers where the pattern occurs (e.g., `grep -n "apple" fruits.txt`)
`-o`: Shows only the matching parts of the line.
`-v`: Invert match, showing lines that do **not** contain the pattern (e.g., `grep -v "apple" fruits.txt`)
`-w`: Match whole words only (e.g., `grep -w "apple" fruits.txt`)
`-l`: List only filenames containing the match (e.g., `grep -l "apple" *.txt`)
`-L`, --files-without-match: Suppress normal output; instead print the name of each input file from which no output would normally have been printed.
`-A NUM`: Show `NUM` lines **after** the match (e.g., `grep -A 2 "apple" fruits.txt`)
`-B NUM`: Show `NUM` lines **before** the match (e.g., `grep -B 2 "apple" fruits.txt`)
`-C NUM`: Show `NUM` lines **before and after** the match (e.g., `grep -C 2 "apple" fruits.txt`)
`-e`, --regexp=PATTERNS: Use PATTERNS for matching
`-E`, --extended-regexp : PATTERNS are extended regular expressions
--include=GLOB, Search only files whose base name matches GLOB (using wildcard matching as described under --exclude). If contradictory --include and --exclude options are given, the last matching one wins. If no --include or --exclude options match, a file is included unless the first such option is --include.
--exclude=GLOB, Skip any command-line file with a name suffix that matches the pattern GLOB, using wildcard matching; a name suffix is either the whole name, or a trailing part that starts with a non-slash character immediately after a slash (/) in the name. When searching recursively, skip any subfile whose base name matches GLOB; the base name is the part after the last slash. A pattern can use *, ?, and [...] as wildcards, and \ to quote a wildcard or backslash character literally.
--exclude-from=FILE, Skip files whose base name matches any of the file-name globs read from FILE (using wildcard matching as described under --exclude).
--exclude-dir=GLOB, Skip any command-line directory with a name suffix that matches the pattern GLOB. When searching recursively, skip any subdirectory whose base name matches GLOB. Ignore any redundant trailing slashes in GLOB.

备注：
1，最后指定文件名的话，可以使用通配符。
比如：grep -in "success" /var/log/*.log
也可以指定多个文件名：
比如：grep "text" file1 file2 file3

REGULAR EXPRESSIONS

The period . matches any single character.

1, Character Classes and Bracket Expressions

A bracket expression is a list of characters enclosed by [ and ]. It matches any single character in that list. If the first character of the list is the caret ^ then it matches any character not in the list; it is unspecified whether it matches an encoding error. For example, the regular expression [0123456789] matches any single digit.

Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, [a-d] is equivalent to [abcd]. Many locales sort characters in dictionary order, and in these locales [a-d] is typically not equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.

Finally, certain named classes of characters are predefined within bracket expressions, as follows. Their names are self explanatory, and they are [:alnum:], [:alpha:], [:blank:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]. For example, [[:alnum:]] means the character class of numbers and letters in the current locale. In the C locale and ASCII character set encoding, this is the same as [0-9A-Za-z]. (Note that the brackets in these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket expression.) Most meta-characters lose their special meaning inside bracket expressions. To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal - place it last.

To prevent your POSIX regex notation from being interpreted by Bash, use double brackets

$ grep -E ^b[[:alpha:]]+

2, Anchoring

The caret ^ and the dollar sign $ are meta-characters that respectively match the empty string at the beginning and end of a line.

3, The Backslash Character and Special Expressions

The symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string provided it's not at the edge of a word. The symbol \w is a synonym for [_[:alnum:]] and \W is a synonym for [^_[:alnum:]].

4, Repetition

A regular expression may be followed by one of several repetition operators:
? The preceding item is optional and matched at most once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{,m} The preceding item is matched at most m times. This is a GNU extension.
{n,m} The preceding item is matched at least n times, but not more than m times.

5, Concatenation

Two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substrings that respectively match the concatenated expressions.

6, Alternation

Two regular expressions may be joined by the infix operator |; the resulting regular expression matches any string matching either alternate expression.

7, Precedence

Repetition takes precedence over concatenation, which in turn takes precedence over alternation. A whole expression may be enclosed in parentheses to override these precedence rules and form a subexpression.

8, Back-references and Subexpressions
The back-reference \n, where n is a single digit, matches the substring previously matched by the nth parenthesized subexpression of the regular expression.

9, Basic vs Extended Regular Expressions

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, $, and $.

Example1:

To search for the word "apple" in a file called `fruits.txt`, use:

grep "apple" fruits.txt

This will print the lines in the file `fruits.txt` that contain the word "apple."

grep -rnw 'copy_to_user'

grep -nr 'PRODUCT' ./ --include!=*.[ch]
grep -nr 'PRODUCT' ./ --include=*.[^ch]
grep -nr 'PRODUCT' ./ --include=*.^[ch]
grep -nr 'PRODUCT' ./ --include=*^[ch]
grep -nr 'PRODUCT' ./ --include=*^[ch]$

Example2:

The following example outputs the location and contents of any line containing “f” and ending in “.c”, within all files in the current directory whose names contain “g” and end in “.h”. The -n option outputs line numbers, the -- argument treats expansions of “*g*.h” starting with “-” as file names not options, and the empty file /dev/null causes file names to be output even if only one file name happens to be of the form “*g*.h”.

$ grep -n -- 'f.*\.c$' *g*.h /dev/null
argmatch.h:1:/* definitions and prototypes for argmatch.c

The only line that matches is line 1 of argmatch.h. Note that the regular expression syntax used in the pattern differs from the glob‐bing syntax that the shell uses to match file names.

举例 - Test in Ubuntu

1, Basic Regular Expressions (BRE)

1.1 Literal Characters:

grep "word" filename

Matches lines containing the exact word "word".

1.2 Any Single Character:

grep "b.t" filename

Matches lines containing "bat", "bit", "bot", etc. The dot `.` represents any single character.

1.3 Character Classes:

grep "b[aeiou]t" filename

Matches lines containing "bat", "bet", "bit", "bot", or "but". The square brackets `[aeiou]` define a character class.

1.4 Negated Character Classes:

grep "b[^aeiou]t" filename

Matches lines containing "b-t", "b_t", etc., but not "bat", "bet", "bit", etc. The caret `^` inside square brackets negates the class.

1.5 Anchors

Start of Line:

grep "^start" filename

Matches lines beginning with "start". The caret `^` outside square brackets represents the start of a line.

End of Line:

grep "end$" filename

Matches lines ending with "end". The dollar sign `$` represents the end of a line.

1.6 Zero or More Repetitions:

grep "fo*" filename

Matches lines containing "f", "fo", "foo", "fooo", etc. The asterisk `*` matches zero or more of the preceding character.

1.7 One or More Repetitions:

grep "fo\+" filename

Matches lines containing "fo", "foo", "fooo", etc. The backslash `\+` matches one or more of the preceding character. (Note: `\+` is used in basic regex; for extended regex, just use `+`.)

2, Extended Regular Expressions (ERE)

To use extended regular expressions with `grep`, use the `-E` option or the `egrep` command.

2.1 Alternation:

grep -E "cat|dog" filename
Matches lines containing "cat" or "dog".

2.2 Grouping:

grep -E "(ab|cd)ef" filename

Matches lines containing "abef" or "cdef".

2.3 Zero or One Repetition:

grep -E "fo?" filename

Matches lines containing "f" or "fo". The question mark `?` matches zero or one of the preceding character.

2.4 Exactly N Repetitions:

grep -E "o{2}" filename