Unix cut command

最新推荐文章于 2022-11-03 11:11:52 发布

最新推荐文章于 2022-11-03 11:11:52 发布 · 153 阅读

文章标签：

#shell #操作系统 #awk

本文介绍如何使用Unix/Linux下的cut命令来从文件中选择特定的列或字段。通过实际示例，展示了cut命令的基本用法及其在不同场景下的应用。

原文链接：http://www.softpanorama.org/Tools/cut.shtml

The externalcutcommand displays selected columns or fields from each line of a file. It is a UNIX equivalent to the relational algebra selection operation. If the capabilities ofcutare not enough, then the alternatives areAWKandPerl.

The cut command uses IFS (Input Field Separators) to determine where to split fields. You can check it withset | grep IFS. You can also set it, for example

IFS=" /t/n"

The most typical usage is cutting one of several columns from a file (oftena log file) to create a new file. For example:

cut -d ' ' -f 2-7

retrieves the second to seventh field assuming that each field is separated by a single (note: single) blank. Option-dspecified a single character delimiter (in the example about it is a blank) which serves as field separator. option-fwhich specifies range of fields included in the output (fields range from two to seven ). Note that option-dpresuppose usage of option-f.

Cut can work in two modes:

column delimited selection(each column starts with certain fixed offset defined as rangefrom-to)
separator-delimited selection(with column separator being a single character like blank, comma, colon, etc). In this modecutuses a delimiter defined by-doption (as in example above). By default cut uses the value of delimiter stored in a shell variable called IFS (Input Field Separators) -- typically TAB.

Cut is essentially a simple text parsing tool and unless the task in hands is also simple you will be better off using other, more flexible, text parsing tools instead. On modern computers difference between invocation of cut and invocation of awk is negligible. You can also usePerl in command line modefor the same task. If option-a(autosplit mode)is specified, then each line in Perl is converted into array@F.So Perl emulation of cut consist of writing a simple print statement that outputs the necessary fields.The advantage of the Perl is that the columns can be counted from the last (using negative indexes).

The advantage of the Perl is that the columns can be counted from the last (using negative indexes).

Typical pages with Perl command line one-liners contain many interesting examples that can probably be adapted to your particular situation:

Here is example on how to print first and the second from the last columns:

perl -lane 'print "$F[0]:$F[-2]/n"'

Here's a simple one-line script that will print out the fourth word of every line, but also skip any line beginning with a # because it's a comment line.

perl -naF 'next if /^#/; print "$F[3]/n"'

The most popular modern usage of cut is probably connected with processing http and proxy logs (seeTips).

Column selection mode

A column is one character position. In this modecutacts as a generalized for filessubstrfunction. Classic Unixcatcannot count characters from the back of the line like Perlsubstrfunction, butrcutcan ). This type of selection is specified with-coption. List entries can be open (from the beginning like in-5,orto the endlike in6-), or closed (like 6-9).

cut -c 4,5,20 foo#cuts foo at columns 4, 5, and 20.

cut -c 1-5 a.dat | more#print the first 5 characters of every line in the filea.dat

cut -c -5 a.dat | more# same as above but using open range

Field selection mode

In this mode cut selects not characters but fields delimited by specific one character delimiter specified by option -d. The list of fields is specified with -f option (-f [list])

cut -d ":" -f1,7 /etc/passwd  # cuts fields 1 and 7 from /etc/passwd

cut -d ":" -f 1,6- /etc/passwd # cuts fields 1,  6 to the end from /etc/passwd

The default delimiter is TAB. If space is used as a delimiter, be sure to put it in quotes (-d " ").

Note:Another way to specify blank (or other shell-sensitive character) is to use / -- the following example prints the second field of every line in the file/etc/passwd

% cut -f2 -d/ /etc/passwd | more

Line suppresssion option

In field selection modecutcan suppress lines that contain no defined in option-ddelimiters (-soption). Unless this option is specified, lines with no delimiters will be included in the output untouched

Complement selection (GNU cut only)

This is GNU cut option only. Option--complementconverts the set of selected bytes, characters or fields to its complement.It applies to the preceding option.In this case you can specify not the list of fields of character columns to be retained, but those that needs to be excluded. In some cases that simplifies the writing of the selection range.
For example instead of the example listed above:

cut -d ":" -f 1,6- /etc/passwd# cuts fields 1 and 6 to the end on the line from /etc/passwd

you can specify:

cut -d ":" -f 2-5 --complement /etc/passwd# cuts fields 1 and 6 to the end on the line from /etc/passwd

By using pipes and output shell redirection operators you can create new files with a subset of columns or fields contained in the first file.

Usage in Shell

Sometimescutis used in shell programming to select certain substrings from a variable, for example:

echo Argument 1 = [$1]
c=`echo $1 | cut -c6-8`
echo Characters 6 to 8 = [$c]

Output:

Argument 1 = [1234567890]
Characters 6 to 8 = [678]

This is one of many ways to perform such a selection. In all but simplest casesAWK or Perl are better tools for the job. If you are selecting fields of a shell variable, you should probably use thesetcommand andechothe desired positional parameter into pipe.

For complex casesPerlis definitely a preferable tool. Moreover several Perl re-implementations of cut exists: see for examplePerl c ut.

BTW Perl implementations are more flexible and less capricious that the C-written original Unix cut command.

Notes on syntax

As I mentioned before there are two variants of cut: the first in character column cut and the second is delimiter based (parsing) cut. In both cases option can be separated from the value by a space, for example

-d ' '

In other words POSIX and GNU implementations of cut uses "almost" standard logical lexical parsing of argument although most examples in the books use "old style" with arguments "glued" to options."Glued" style of specifying arguments is generally an anachronism. Still quoting of delimiter might not always be possible even in modern versions for example most implementations of cut requires that delimiter/t(tab) be specified without quotes. You generally need to experiment with your particular implementation.

1.Character column cut

     cut -c list [ file_list ]

Option:

-clistDisplay (cut) columns, specified in list, from the input data. Columns are counted from one, not from zero, so the first column is column 1. List can be separated from the option by space(s) but no spaces are allowed within the list. Multiple values must be comma (,) separated. The listdefines the exact columns to display. For example, the -c 1,4,7notation cuts columns 1, 4, and 7 of the input. The -c -10,50 would select columns 1 through 10 and 50 through end-of-line (please remember that columns are conted from one)

2. Delimiter-based (parsing) cut

     cut -f list [ -d char ] [ -s ] [ file_list ]

Options:

dcharThe charactercharis used as the field delimiter. It is usually quoted but can be escaped. The default delimiter is a tab character. To use a character that has special meaning to the shell, you must quote the character so the shell does not interpret it. For example, to use a single space as a delimiter, type -d' '.

-flist Selects (cuts) fields, specified inlist, from the input data. Fields are counted from one, not from zero. No spaces are allowed within the list. Multiple values must be comma (,) separated. Thelistdefines the exact field to display. The most practically important ranges are "open" ranges, were either starting field or the last field are not specified explicitly (omitted). For example:

Selection from the beginning of the line to a certain field is specified as-N, were N is the number of the filed. For example
-f -5
Selection from the certain filed to the end of the line (all fileds starting from N) is specified asN-.For example-f 5-

Specification can be complex and include both selected fields and ranges. For example,-f 1,4,7would select fields 1, 4, and 7. The-f2,4-6,8would select fields 2 to 6 (range) and field 8.

Limitations

Please remember that cut is good only for simple cases. In complex cases AWK and Perl actually save your time. Limitations are many. Among them:

Delimiter are single characters; they are not regular expressions.This leads to huge disappointment when you try to parse blank-delimited file with cut: multiple blanks are counted as multiple filed separators.
Syntax is irregular and sometimes tricky.For example one character delimiters can be quoted but escaped delimiters cannot be quoted.
Semantic is the most basic.Cut is essentially a text parser and as such is suitable mainly for parsing colon delimited and similar files. Functionality does even match the level of Fortran IV format statement.

Examples

[From AIX cut man page] To display several fields of each line of a file, enter:
cut-f1,5 -d : /etc/passwd
This displays the login name and full user name fields of the system password file. These are the first and fifth fields (-f 1,5) separated by colons (-d :).

For example, if the/etc/passwdfile looks like this:

su:*:0:0:User with special privileges:/:/usr/bin/sh
daemon:*:1:1::/etc:
bin:*:2:2::/usr/bin:
sys:*:3:3::/usr/src:
adm:*:4:4:System Administrator:/var/adm:/usr/bin/sh
pierre:*:200:200:Pierre Harper:/home/pierre:/usr/bin/sh
joan:*:202:200:Joan Brown:/home/joan:/usr/bin/sh

Thecutcommand produces:

su:User with special privileges
daemon:
bin:
sys:
adm:System Administrator
pierre:Pierre Harper
joan:Joan Brown

[From AIX cut man page] To display fields using a blank separated list, enter:
```
cut -f "1 2 3" -d : /etc/passwd
```
Thecutcommand produces:
```
su:*:0
daemon:*:1
bin:*:2
sys:*:3
adm:*:4
pierre:*:200
joan:*:202
```
[fromThe cut command of ebookShell ScriptingbyHamishWhittal] Since we're only interested in fields 2,3 and 4 of our memory, we can extract these using:

free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4 >> mem.stats

The cut command has the ability to cut out characters or fields. cut uses delimiters.

The cut command uses delimiters to determine where to split fields, so the first thing we need to understand about cut is how it determines what its delimiters are. By default, cut's delimiters are stored in a shell variable called IFS (Input Field Separators).

Typing:
set | grep IFS            
will show you what the separator characters currently are; at present, IFS is either a tab, or a new line or a space.

Looking at the output of ourfreecommand, we successfully separated every field by a space (remember thetrcommand!)

Similarly, if our delimiter between fields was a comma, we could set the delimiter within cut to be a comma using the -d switch:
cut -d ","            
The cut command lets one cut on the number of characters or on the number of fields. Since we're only interested in fields 2,3 and 4 of our memory, we can extract these using:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4
            
Why do you need to set -d " " even when IFS already specifies that a spaces is a IFS ?

If this does not work on your system, then you need to set the IFS variable.

Detour:

Setting shell variables is easy. If you use the bash or the Bourne shell (sh), then:
IFS=" /t/n"            
In the csh or the ksh, it would be:

	setenv IFS=" /t/n"

That ends this short detour.

At this point, it would be nice to save the output to a file. So let's append this to a file called mem.stats:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4 >> mem.stats
            
Every time you run this particular command it should append the output to the mem.stats file.

The -f switch allows us to cut based upon fields. If we were wanting to cut based upon characters (e.g. cut character 6-13 and 15, 17) we would use the -c switch.

To affect the above example:
free | tr -s ' ' | sed '/^Mem/!d' | cut -c6-13,15,17 >> mem.stats
            
First Example in stages:
1. For the next example I'd like you to make sure that you've logged on as a user (potentially root) on one of your virtual terminals.

How do you get to a virtual terminal? Ctrl-Alt plus F1 or F2 or F3 etcetera.

It should prompt you for a username and a password. Log in as root, or as yourself or as a different user and once you've logged in, switch back to your X terminal with Alt-F7. If you weren't working on X at the beginning of this session, then the Ctrl + Alt + F1 is not necessary. A simple Alt + F2 would open a new terminal, to return to the first terminal press Alt+F1.

2. Run the who command:
who
This will tell us who is logged on to the system. We could also run the w command:
w
This will not only tell us who is logged on to our system, but what they're doing. Let's use the w command, since we want to save information about what users are doing on our system. We may also want to save information about how long they've been idle and what time they logged on.

3. Find out who is logged on to your system. Pipe the output of the w command into the input of cut. This time however we're not going to use a delimiter to delimit fields but we're going to cut on characters. We could say:
w | cut -c1-8
This tells the cut command the first eight characters. Doing this you will see that it cuts up until the first digit of the second. So in my case the time is now
09:57:24 
and it cuts off to
09:57:2
It also cuts off the user. So if you look at this, you're left withUSERand all the users currently logged onto your system. And that's cutting exactly 8 characters.

4. To cut characters 4 to 8?
w | cut -c4-8
This will produce slightly bizarre-looking output.

So cut cannot only cut fields, it can cut exact characters and ranges of characters. We can cut any number of characters in a line.
Second Example in stages:
Often cutting characters in a line is less than optimal, since you never know how long your usernames might be. Really long usernames would be truncated which clearly would not be acceptable. Cutting on characters is rarely a long-term solution.. It may work because your name is Sam, but not if your name is Jabberwocky!

1. Let's do a final example using cut. Using our password file:
cat /etc/passwd
I'd like to know all usernames on the system, and what shell each is using.

The password file has 7 fields separated by a ':'. The first field is the login username, the second is the password which is an x (because it is kept in the shadow password file), the third field is the userid, the fourth is the group id, the fifth field is the comment, the sixth field is the users home directory and the seventh field 7 indicates the shell that the user is using. I'm interested in fields 1 and 7.

2. How would we extract the particular fields? Simple:^[6]
	cat /etc/passwd |cut -d: -f1,7
	cut -d  -f1,7	
	cut -d" " -f 1,7
	
If we do this, we should end up with just the usernames and their shells. Isn't that a nifty trick?

3. Let's pipe that output to the sort command, to sort the usernames alphabetically:
cat /etc/passwd | cut -d: -f1,7 | sort
                
Third example in stages
So this is a fairly simple way to extract information out of files. The cut command doesn't only work with files, it also works with streams. We could do a listing which that would produce a number of fields. If you recall, we used thetrcommand earlier to squeeze spaces.
ls -al 
If you look at this output, you will see lines of fields. Below is a quick summary of these fields and what they refer to.

field number indication of
1 permissions of the file
2 number of links to the file
3 user id
4 group id
5 size of the file
6 month the file was modified
7 day the file was modified
8 time the file was modified
9 name of the file

I'm particularly interested in the size and the name of each file.

1. Let's try and use our cut command in the same way that we used it for the password file:
ls -al  |  cut -d' ' -f5,8
The output is not as expected. Because it is using a space to look for separate fields, and the output contains tabs. This presents us with a bit of a problem.

2. We could try using a /t (tab) for the delimiter instead of a space, however cut only accepts a single character (/t is two characters). An alternative way of inserting a special character like tab is to type Ctrl-v then hit the tab key.
^v + <tab> 
That would replace the character by a tab.
ls -al  |  cut -d"       " -f5,8
                
That makes the delimiter a tab. But, we still don't get what we want, so let's try squeezing multiple spaces into a single space in this particular output. Thus:
ls -la |  tr -s ' ' | cut -d' ' -f5,8
3. And hopefully that should now produce the output we're after. If it produces the output we're after on your system, then we're ready for lift-off. If it doesn't, then try the command again.

Now what happens if we want to swap the name with the size? I'll leave that as an exercise for you.
Exercises:
Using the tr and the cut commands, perform the following:
Obtain the mount point, the percentage in use and the partition of that mount of you disk drive to produce the following:
/dev/hdb2 80% /home
                      
Replace the spaces in your output above by colons (:)
Remove the /dev/shm line
Keep all output from the running of this command for later use.
As root, make the following change:^[7]
chmod o+r /dev/hda 
                        
Now, obtain the Model and Serial Number of your hard disk, using the command hdparm.
Obtain the stats (reads and writes etc.) on your drive using the iostat command, keeping the output as a comma separated value format file for later use

field number	indication of
1	permissions of the file
2	number of links to the file
3	user id
4	group id
5	size of the file
6	month the file was modified
7	day the file was modified
8	time the file was modified
9	name of the file