Hadoop是Apache提出的一个软件框架(即:开放源码并行运算编程工具和分布式文件系统,与MapReduce和Google档案系统的概念类似)...

Hadoop是一个支持数据密集型分布式应用的开源软件框架,允许使用简单的编程模型处理大型数据集。它能够从单一服务器扩展到成千上万台机器,每台机器提供本地计算和存储。Hadoop的设计能够检测和处理应用程序级别的故障,从而在由可能故障的计算机组成的集群上提供高可用性服务。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Apache Hadoopis asoftware frameworkthat supports data-intensivedistributed applicationsunder afree license.[1]It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired byGoogle'sMapReduceandGoogle File System(GFS) papers.

Hadoop is a top-levelApacheproject being built and used by a global community of contributors,[2]using theJavaprogramming language.Yahoo!has been the largest contributor[3]to the project, and uses Hadoop extensively across its businesses.[4]

Hadoop was created byDoug Cutting,[5]who named it after his son's toy elephant.[6]It was originally developed to support distribution for theNutchsearch engine project.[7]

Contents

[hide]

http://hadoop.apache.org/

What Is Apache Hadoop?

The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.

The project includes these subprojects:

Other Hadoop-related projects at Apache include:

  • Avro™: A data serialization system.
  • Cassandra™: A scalable multi-master database with no single points of failure.
  • Chukwa™: A data collection system for managing large distributed systems.
  • HBase™: A scalable, distributed database that supports structured data storage for large tables.
  • Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
  • Mahout™: A Scalable machine learning and data mining library.
  • Pig™: A high-level data-flow language and execution framework for parallel computation.
  • ZooKeeper™: A high-performance coordination service for distributed applications.

Who Uses Hadoop?

A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the HadoopPoweredBywiki page.

http://www.oschina.net/p/hadoop

Hadoop并不仅仅是一个用于存储的分布式文件系统,而是设计用来在由通用计算设备组成的大型集群上执行分布式应用的框架。

下图是Hadoop的体系结构:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值