linux uniq sort 排重、排序

最新推荐文章于 2025-07-16 21:30:58 发布

cherry_hit_tom

最新推荐文章于 2025-07-16 21:30:58 发布

阅读量1.5w

点赞数 1

CC 4.0 BY-SA版权

分类专栏： shell Linux学习文章标签： linux integer character numbers output input

本文链接：https://blog.youkuaiyun.com/tsuliuchao/article/details/8073106

Linux学习同时被 2 个专栏收录

110 篇文章

订阅专栏

shell

6 篇文章

订阅专栏

本文详细介绍了 Linux 系统中 uniq 和 sort 命令的高级用法，包括如何利用 uniq 对文件内容进行去重并统计重复次数，以及如何使用 sort 实现对多个列值进行复杂排序，特别强调了逆序排列和指定分隔符的应用场景。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

有如下文件a.txt

[root@m95] /ftproot# cat a.txt
ttt|000001
uuu|000002
uuu|000002
uuu|000002
uuu|000002
1
2
3
4
5
6
7
77
8
9
9

=====================================

#cat a.txt | uniq -c -i | sort -k2 -n 排重，排重输出的第二列正序排列
#cat a.txt | uniq -c -i | sort -k2 -rn 排重，排重输出的第二列逆序排列

uniq 参数解释

-c 统计重复数量

     -c      Precede each output line with the count of the number of times
             the line occurred in the input, followed by a single space.

     -d      Only output lines that are repeated in the input.

     -f num Ignore the first num fields in each input line when doing compar-
             isons. A field is a string of non-blank characters separated
             from adjacent fields by blanks. Field numbers are one based,
             i.e., the first field is field one.

     -s chars
             Ignore the first chars characters in each input line when doing
             comparisons. If specified in conjunction with the -f option, the
             first chars characters after the first num fields will be
             ignored. Character numbers are one based, i.e., the first char-
             acter is character one.

     -u      Only output lines that are not repeated in the input.

-i Case insensitive comparison of lines.

=============================================================================

linux关于sort命令的高级用法（按多个列值进行排列）

如果单纯地使用sort按行进行排序比较简单，

但是使用sort按多个列值排列，同时使用tab作为分隔符，而且对于某些列需要进行逆序排列，这样sort命令写起来就比较麻烦了

比如下面的文件内容，使用[TAB]进行分割:

Group-ID   Category-ID   Text        Frequency
----------------------------------------------
200        1000          oranges     10
200        900           bananas     5
200        1000          pears       8
200        1000          lemons      10
200        900           figs        4
190        700           grapes      17

下面使用这些列进行排序（列4在列3之前进行排序，而且列4是逆序排列）

    * Group ID (integer)
    * Category ID (integer)
    * Frequency “sorted in reverse order” (integer)
    * Text (alpha-numeric)

排序后的结果应该为：

Group-ID   Category-ID   Text        Frequency
----------------------------------------------
190        700           grapes      17
200        900           bananas     5
200        900           figs        4
200        1000          lemons      10
200        1000          oranges     10
200        1000          pears       8

可以直接使用sort命令来解决这个问题：

BASH CODE

sort -t $'\t' -k 1n,1 -k 2n,2 -k4rn,4 -k3,3 <my-file>

解释如下：

-t $'\t'：指定TAB为分隔符
-k 1, 1: 按照第一列的值进行排序，如果只有一个1的话，相当于告诉sort从第一列开始直接到行尾排列
n:代表是数字顺序，默认情况下市字典序，如10<2
r: reverse 逆序排列，默认情况下市正序排列

所以最后的命令：sort -t $’\t’ -k 1n,1 -k 2n,2 -k4rn,4 -k3,3 my-file