hadoop官网翻译第一天

Apache Hadoop是一款开源软件库,旨在提供可靠、可扩展的分布式计算平台,适用于大规模数据集的处理。它通过在集群计算机上采用简单的编程模型进行分布式任务,设计上能从单个服务器扩展至数千台机器,每台机器提供本地计算和存储。Hadoop通过在应用层检测和处理故障,而非依赖硬件,实现了高可用性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

The Apache™ Hadoop

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

Apache Hadoop项目是以可靠、可扩展和分布式计算为目的而发展而来的开源软件

The Apache Hadoop software library is a framework that allows for the distributed processing(处理) of large data sets across clusters of computers using simple programming models. It is designed to scale up(放大,扩展) from single servers to thousands of machines, each offering local computation and storage. Rather than rely on (依赖) hardware to deliver (提供) high-availability(高可用性), the library itself is designed to detect(发现,检查) and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Apache Hadoop 软件库是一个允许在集群计算机上使用简单的编程模型来进行大数据集的分布式任务的框架,它是设计来从单服务器扩展到成千台机器上,每个机器提供本地的计算和存储。相比于依赖硬件来实现高可用,该库自己设计来检查和管理应用部署的失败情况,因此是在集群计算机之上提供高可用的服务,每个节点都有可能失败。

 

Modules(组件)

The project includes these modules:

  • Hadoop Common: The common utilities that support the other Hadoop modules.

        通用的工具来支持其他的Hadoop模块

  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput (高吞吐量)access to application data.

        一个提供高可用获取应用数据的分布式文件系统

  • Hadoop YARN: A framework for job scheduling(调度) and cluster resource management.

       Job调度和集群资源管理的框架

  • Hadoop MapReduce: A YARN-based system for parallel(并行) processing of large data sets.

       基于YARN系统的并行处理大数据集的编程模型

        Hadoop的对象存储。

 

 Related projects 相关项目

Other Hadoop-related projects at Apache include:

  • Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.

       一个基于web的工具,用来供应、管理和监测Apache Hadoop集群包括支持Hadoop HDFS、Hadoop MapReduce、Hive、           HCatalog、HBase、ZooKeeper、Oozie、Pig和Sqoop。Ambari 也提供一个可视的仪表盘来查看集群的健康状态(比如热              图),并且能够以一种用户友好的方式根据其特点可视化的查看MapReduce、pig和Hive 应用来诊断其性能特征。

  • Avro™: A data serialization system.

    数据序列化系统

  • Cassandra™: A scalable multi-master database with no single points of failure.

    可扩展的多主节点数据库,而且没有单节点失败情况

  • Chukwa™: A data collection system for managing large distributed systems.

      管理大型分布式系统的数据收集系统

  • HBase™: A scalable, distributed database that supports structured data storage for large tables.

     一个可扩展的分布式数据库,支持大表的结构化数据存储

  • Hive™: A data warehouse infrastructure that provides data summarization(概述) and ad hoc querying.

      一个提供数据概述和AD组织查询的数据仓库

  • Mahout™: A Scalable machine learning and data mining library.

       可扩展大的机器学习和数据挖掘库

  • Pig™: A high-level data-flow language and execution framework for parallel computation.

      一个支持并行计算的高级的数据流语言和执行框架

  • Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.

         一个快速通用的Hadoop数据的计算引擎。spark 提供一个简单和富有表现力的编程模型并支持多领域应用,包括ETL、机            器学习、流处理 和图计算。

  • Tez™: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine.

       一个通用的数据流处理框架,构建在Hadoop YARN上,提供一个有力的灵活的引擎来执行一个任意的DAG任务来处理数据          (批处理和交互式两种方式)。Tez 可以被Hive、Pig和其他Hadoop生态系统框架和其他商业软件(如:ETL工具)使用,用来          替代Hadoop MapReduce 作为底层的执行引擎。

  • ZooKeeper™: A high-performance coordination service for distributed applications.

      一个应用于分布式应用的高性能的协调服务

 

加今日份的小可爱

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值