Baidu/Taobao Hadoop

本文对比了百度和淘宝两大互联网巨头使用Hadoop的实际案例。百度采用C++版本的HCE,拥有4000节点集群,每日数据生成量达到3PB。而淘宝的“云梯”则由1600多个节点组成,总容量27.79PB,每天处理40000个作业。两家公司都在原有基础上进行了不同程度的定制和优化。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

百度和淘宝是目前国内Hadoop的最大使用者,在NoSQL fan上看了百度和淘宝Hadoop集群的揭秘,总结一下。链接分别如下:

 

百度Hadoop分布式系统揭秘:4000节点集群: http://blog.nosqlfan.com/html/983.html

淘宝Hadoop数据分析实践:http://www.slideshare.net/coderplay/hadoop-9256433

 

百度:

百度一向是走C++系的,其Hadoop平台也是采用的C++版的HCE,而且值得注意的是百度是HyperTable的主要赞助者,HyperTable是C++版的HBase。可见百度对于Java的性能还是有些不放心吧,呵呵。

 

百度的Hadoop集群主要用于后端数据训练和计算,目前规模如下:

4000节点,10+个集群,最大集群1000+节点;

单节点配置:8 core CPU,16G 内存,12T硬盘

数据生成量:3PB/天

 

百度对HCE也进行了一些优化,例如:调度器是在capacity-scheduler的基础上根据自身业务改进的,对shuffle流程也进行了大幅改造。

 

淘宝:

一直比较崇拜淘宝,想当当年找工作真正见识到了淘宝面试官的技术实力,那叫一个牛啊,面完之后,自信心严重受挫......希望有朝一日也能进淘宝吧。言归正传,淘宝极具分享精神,在很多会议上和论坛上都发布了其Hadoop实践经验。

 

淘宝的Hadoop集群成为“云梯”,主要也是用于数据的分析。目前配置:

1600+节点,总容量27.79PB,6.6千万个file,每台机器12T/24T。

40000job/天,扫描数据1.7PB/天,产生数据255TB/天

用户数820个,用户组67个

淘宝也根据其自由业务对Hadoop进行了很多优化,具体细节见slideshare的ppt。

root@job-da8abcdd-9948-4878-9d20-371dceb00ee1-master-0:/home# start-dfs.sh Starting namenodes on [master] /opt/hadoop/hadoop/bin/hdfs: 26: function: not found /opt/hadoop/hadoop/bin/hdfs: 28: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 29: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 30: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 31: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 32: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 33: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 35: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 36: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 37: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 38: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 39: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 40: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 41: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 42: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 43: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 44: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 45: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 46: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 47: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 48: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 49: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 50: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 51: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 52: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 53: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 54: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 55: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 56: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 57: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 58: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 59: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 60: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 61: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 62: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 63: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 64: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 65: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 66: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 67: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 68: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 69: hadoop_generate_usage: not found /opt/hadoop/hadoop/bin/hdfs: 77: function: not found /opt/hadoop/hadoop/bin/hdfs: 218: hadoop_validate_classname: not found /opt/hadoop/hadoop/bin/hdfs: 219: hadoop_exit_with_usage: not found /opt/hadoop/hadoop/bin/hdfs: 226: [[: not found /opt/hadoop/hadoop/bin/hdfs: 235: [[: not found ERROR: Cannot execute /opt/hadoop/hadoop/bin/../libexec/hdfs-config.sh. Starting datanodes /opt/hadoop/hadoop/bin/hdfs: 26: function: not found /opt/hadoop/hadoop/bin/hdfs: 28: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 29: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 30: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 31: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 32: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 33: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 35: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 36: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 37: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 38: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 39: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 40: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 41: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 42: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 43: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 44: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 45: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 46: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 47: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 48: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 49: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 50: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 51: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 52: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 53: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 54: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 55: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 56: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 57: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 58: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 59: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 60: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 61: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 62: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 63: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 64: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 65: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 66: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 67: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 68: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 69: hadoop_generate_usage: not found /opt/hadoop/hadoop/bin/hdfs: 77: function: not found /opt/hadoop/hadoop/bin/hdfs: 218: hadoop_validate_classname: not found /opt/hadoop/hadoop/bin/hdfs: 219: hadoop_exit_with_usage: not found /opt/hadoop/hadoop/bin/hdfs: 226: [[: not found /opt/hadoop/hadoop/bin/hdfs: 235: [[: not found ERROR: Cannot execute /opt/hadoop/hadoop/bin/../libexec/hdfs-config.sh. Starting secondary namenodes [job-da8abcdd-9948-4878-9d20-371dceb00ee1-master-0] /opt/hadoop/hadoop/bin/hdfs: 26: function: not found /opt/hadoop/hadoop/bin/hdfs: 28: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 29: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 30: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 31: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 32: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 33: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 35: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 36: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 37: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 38: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 39: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 40: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 41: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 42: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 43: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 44: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 45: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 46: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 47: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 48: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 49: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 50: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 51: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 52: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 53: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 54: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 55: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 56: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 57: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 58: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 59: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 60: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 61: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 62: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 63: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 64: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 65: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 66: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 67: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 68: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 69: hadoop_generate_usage: not found /opt/hadoop/hadoop/bin/hdfs: 77: function: not found /opt/hadoop/hadoop/bin/hdfs: 218: hadoop_validate_classname: not found /opt/hadoop/hadoop/bin/hdfs: 219: hadoop_exit_with_usage: not found /opt/hadoop/hadoop/bin/hdfs: 226: [[: not found /opt/hadoop/hadoop/bin/hdfs: 235: [[: not found ERROR: Cannot execute /opt/hadoop/hadoop/bin/../libexec/hdfs-config.sh.
07-11
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值