Apache Hadoop Wins Terabyte Sort Benchmark-优快云博客

Yahoo使用Hadoop集群在209秒内完成了1TB数据的完全排序，打破了之前的297秒记录。此次排序使用了10亿条100字节的记录，并将结果写入磁盘。这是Java及开源程序首次赢得该年度通用型Terabyte排序基准测试。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1T字节的数据排序209秒内完成，成功打破297秒的纪录。

100亿100字节的纪录，

yahoo拥有13000以上各节点的Hadopp集群。

One of Yahoo's Hadoop clusters sorted 1 terabyte of data in 209 seconds , which beat the previous record of 297 seconds in the annual general purpose (daytona) terabyte sort benchmark . The sort benchmark, which was created in 1998 by Jim Gray, specifies the input data (10 billion 100 byte records), which must be completely sorted and written to disk. This is the first time that either a Java or an open source program has won. Yahoo is both the largest user of Hadoop with 13,000+ nodes running hundreds of thousands of jobs a month and the largest contributor, although non-Yahoo usage and contributions are increasing rapidly.

The cluster statistics were:

910 nodes
2 quad core Xeons @ 2.0ghz per a node
4 SATA disks per a node
8G RAM per a node
1 gigabit ethernet on each node
40 nodes per a rack
8 gigabit ethernet uplinks from each rack to the core
Red Hat Enterprise Linux Server Release 5.1 (kernel 2.6.18)
Sun Java JDK 1.6.0_05-b13

The benchmark was run with Hadoop trunk (pre-0.18) with a couple of optimization patches to remove intermediate writes to disk. The sort used 1800 maps and 1800 reduces and allocated enough memory to buffers to hold the intermediate data in memory. All of the code for the benchmark has been checked in as a Hadoop example.