对无标题的文本做行列转置

博客讨论了如何将一个以制表符分隔的文件中,根据第一列的值将第二列的内容转置为多列。提出了使用Unix命令或SPL编程语言来实现这个转换,其中SPL的代码简洁明了,能够完成数据分组和行转列的操作。

【问题】
I have a tab delimited file that looks as follows:

cat my file.txt

gives:

 1 299
 1 150
 1 50
 1 57
 2 -45
 2 62
 3 515
 3 215
 3 -315
 3 -35
 3 3
 3 6789
 3 34
 5 66
 5 1334
 5 123

I’d like to use Unix commands to get a tab-delimited file that based on values in column#1, each column of the output file will hold all relevant values of column#2 (I’m using here for the example the separator “|” instead of tab only to illustrate my desired output file):

299 | -45| 515| 66
150| 62| 215| 1334
50 | | -315 |
57 | | -35 |
| | 3 |

The corresponding Headers (1,2,3,5; based on column#1 values) could be a nice addition to the code (as shown below), but the main request is to split the information of the first file into separated columns. Thanks!

1| 2| 3| 5
299 | -45| 515| 66
150| 62| 215| 1334
50 | | -315 |
57 | | -35 |
| | 3 |

别人解答:

awk '
BEGIN{max=0;}
{
 d\[$1\]\[length(d\[$1\])+1\] = $2;
 if(length(d\[$1\])>max)
 max = length(d\[$1\]);
}
END{
 PROCINFO\["sorted\_in"\] = "@ind\_num_asc";
 line = "";
 flag = 0;
 for(j in d){
 line = line (flag?"\\t|\\t":"") j;
 flag = 1;
 }
 print line;
 for(i=1; i<=max; ++i){
 line = "";
 flag = 0;
 for(j in d){
 line = line (flag?"\\t|\\t":"") d\[j\]\[i\];
 flag = 1;
 }
 print line;
 }
}' file.txt

you get

1 | 2 | 3 | 5
299 | -45 | 515 | 66
150 | 62 | 215 | 1334
50 | | -315 |
57 | | -35 |
 | | 3 |

Or, you can use python …. for example, in split2Columns.py

import sys

records = \[line.split() for line in open(sys.argv\[1\])\]

import collections

records_dict = collections.defaultdict(list)

for key, val in records:

 records_dict\[key\].append(val)

from itertools import izip_longest

print "\\t|\\t".join(records_dict.keys())

print "\\n".join(("\\t|\\t".join(map(str,l)) for l in izip\_longest(*records\_dict.values(), fillvalue="")))

【回答】
先将数据按第 1 字段分组,生成期望结果集,然后将分组成员竖着填入结果集。

上述算法涉及分组后运算及行转列,如无特殊要求可用 SPL 实现,代码简单易懂:

A
1=file("D:/file.txt").import()
2=A1.group(#1)
3=create(${A2.(#1).concat(",")}).paste(${A2.len().("A2("/~/").(#2)").concat(",")})

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值