Apache Hadoop Wins Terabyte Sort Benchmark

Yahoo使用Hadoop集群在209秒内完成了1TB数据的完全排序,打破了之前的297秒记录。此次排序使用了10亿条100字节的记录,并将结果写入磁盘。这是Java及开源程序首次赢得该年度通用型Terabyte排序基准测试。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1T字节的数据排序209秒内完成,成功打破297秒的纪录。

100亿100字节的纪录,

yahoo拥有13000以上各节点的Hadopp集群。

 

One of Yahoo's Hadoop clusters sorted 1 terabyte of data in 209 seconds , which beat the previous record of 297 seconds in the annual general purpose (daytona) terabyte sort benchmark . The sort benchmark, which was created in 1998 by Jim Gray, specifies the input data (10 billion 100 byte records), which must be completely sorted and written to disk. This is the first time that either a Java or an open source program has won. Yahoo is both the largest user of Hadoop with 13,000+ nodes running hundreds of thousands of jobs a month and the largest contributor, although non-Yahoo usage and contributions are increasing rapidly.

The cluster statistics were:

  • 910 nodes
  • 2 quad core Xeons @ 2.0ghz per a node
  • 4 SATA disks per a node
  • 8G RAM per a node
  • 1 gigabit ethernet on each node
  • 40 nodes per a rack
  • 8 gigabit ethernet uplinks from each rack to the core
  • Red Hat Enterprise Linux Server Release 5.1 (kernel 2.6.18)
  • Sun Java JDK 1.6.0_05-b13

The benchmark was run with Hadoop trunk (pre-0.18) with a couple of optimization patches to remove intermediate writes to disk. The sort used 1800 maps and 1800 reduces and allocated enough memory to buffers to hold the intermediate data in memory. All of the code for the benchmark has been checked in as a Hadoop example.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值