Friday, December 17, 2010Hadoop cluster at Ebay
I am always curious to know how other companies are installing Hadoop clusters. How are they using its ecosystem. Since Hadoop is still relatively new, there are no best practices. Every company is implementing what they think is the best infrastructure for the Hadoop Cluster.
At Hadoop NYC 2010 conference, ebay showcased there implementation of Hadoop production cluster. Following are some tidbits on ebay's implementation of Hadoop.
- JobTracker, Namenode, Zookeeper, HBase Master are all enterprise nodes running in Sun 64 bit architecture. They are running red hat linux with 72GB Ram and 4TB disks.
- There are 4000 datanodes, each running cent OS with 48 GB RAM and 10TB space
- Ganglia and Nagios are used for monitoring and alerting. Ebay is also building a custom solution to augment them.
- ETL is done using mostly Java Map Reduce programs
- Pig is used to build data pipelines
- Hive is used for AdHoc queries
- Mahout is used for Data Mining
They are toying with the idea of using Oozie to manage work flows but haven't decided to use it yet.
It looks like they are doing all the right things.
hadoop cluster at ebay
最新推荐文章于 2022-08-26 23:35:43 发布
eBay在HadoopNYC 2010会议上展示了其Hadoop生产集群的实施情况。该集群包括运行在Sun 64位架构上的JobTracker、Namenode等关键组件,配备72GB内存及4TB硬盘;4000个数据节点,每个拥有48GB RAM和10TB存储空间。此外,eBay还使用了Ganglia和Nagios进行监控,并采用Java MapReduce、Pig、Hive和Mahout等工具处理数据。
2万+

被折叠的 条评论
为什么被折叠?



