The Apache™ Hadoop
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
Apache Hadoop项目是以可靠、可扩展和分布式计算为目的而发展而来的开源软件
The Apache Hadoop software library is a framework that allows for the distributed processing(处理) of large data sets across clusters of computers using simple programming models. It is designed to scale up(放大,扩展) from single servers to thousands of machines, each offering local computation and storage. Rather than rely on (依赖) hardware to deliver (提供) high-availability(高可用性), the library itself is designed to detect(发现,检查) and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Apache Hadoop 软件库是一个允许在集群计算机上使用简单的编程模型来进行大数据集的分布式任务的框架,它是设计来从单服务器扩展到成千台机器上,每个机器提供本地的计算和存储。相比于依赖硬件来实现高可用,该库自己设计来检查和管理应用部署的失败情况,因此是在集群计算机之上提供高可用的服务,每个节点都有可能失败。
Modules(组件)
The project includes these modules:
- Hadoop Common: The common utilities that support the other Hadoop modules.
通用的工具来支持其他的Hadoop模块
- Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput (高吞吐量)access to application data.
一个提供高可用获取应用数据的分布式文件系统
- Hadoop YARN: A framework for job scheduling(调度) and cluster resource management.
Job调度和集群资源管理的框架
- Hadoop MapReduce: A YARN-based system for parallel(并行) processing of large data sets.
基于YARN系统的并行处理大数据集的编程模型
- Hadoop Ozone: An object store for Hadoop.
Hadoop的对象存储。
Related projects 相关项目
Other Hadoop-related projects at Apache include:
- Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
一个基于web的工具,用来供应、管理和监测Apache Hadoop集群包括支持Hadoop HDFS、Hadoop MapReduce、Hive、 HCatalog、HBase、ZooKeeper、Oozie、Pig和Sqoop。Ambari 也提供一个可视的仪表盘来查看集群的健康状态(比如热 图),并且能够以一种用户友好的方式根据其特点可视化的查看MapReduce、pig和Hive 应用来诊断其性能特征。
- Avro™: A data serialization system.
数据序列化系统
- Cassandra™: A scalable multi-master database with no single points of failure.
可扩展的多主节点数据库,而且没有单节点失败情况
- Chukwa™: A data collection system for managing large distributed systems.
管理大型分布式系统的数据收集系统
- HBase™: A scalable, distributed database that supports structured data storage for large tables.
一个可扩展的分布式数据库,支持大表的结构化数据存储
- Hive™: A data warehouse infrastructure that provides data summarization(概述) and ad hoc querying.
一个提供数据概述和AD组织查询的数据仓库
- Mahout™: A Scalable machine learning and data mining library.
可扩展大的机器学习和数据挖掘库
- Pig™: A high-level data-flow language and execution framework for parallel computation.
一个支持并行计算的高级的数据流语言和执行框架
- Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
一个快速通用的Hadoop数据的计算引擎。spark 提供一个简单和富有表现力的编程模型并支持多领域应用,包括ETL、机 器学习、流处理 和图计算。
- Tez™: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine.
一个通用的数据流处理框架,构建在Hadoop YARN上,提供一个有力的灵活的引擎来执行一个任意的DAG任务来处理数据 (批处理和交互式两种方式)。Tez 可以被Hive、Pig和其他Hadoop生态系统框架和其他商业软件(如:ETL工具)使用,用来 替代Hadoop MapReduce 作为底层的执行引擎。
- ZooKeeper™: A high-performance coordination service for distributed applications.
一个应用于分布式应用的高性能的协调服务
加今日份的小可爱