MapReduce是Google提出的一个软件架构,用于大规模数据集(大于1TB)的并行运算。...

MapReduce是一种由Google推出的软件框架,用于大规模数据集的分布式计算。该框架受功能性编程中map和reduce函数的启发,能够支持在集群上的大量计算机节点上进行数据处理。MapReduce将任务分解为“Map”和“Reduce”两个阶段,前者负责将输入数据划分为小块并进行处理,后者则负责汇总处理结果。这一框架允许分布式处理,并能在服务器或存储部分故障的情况下提供一定的恢复能力。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

MapReduceis apatented[1]software frameworkintroduced byGooglein 2004 to supportdistributed computingon largedata setsonclustersof computers.[2]

The framework is inspired by themapandreducefunctions commonly used infunctional programming,[3]although their purpose in the MapReduce framework is not the same as their original forms.[4]

MapReducelibrarieshave been written inC++,C#,Erlang,Java,OCaml,Perl,Python,PHP,Ruby,F#,Rand other programming languages.

Contents

[]

[]Overview

MapReduce is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers (nodes), collectively referred to as a cluster (if all nodes use the same hardware) or as a grid (if the nodes use different hardware). Computational processing can occur on data stored either in afilesystem(unstructured) or within adatabase(structured).

"Map" step:The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-leveltreestructure. The worker node processes that smaller problem, and passes the answer back to its master node.

"Reduce" step:The master node then takes the answers to all the sub-problems and combines them in some way to get the output– the answer to the problem it was originally trying to solve.

The advantage of MapReduce is that it allows for distributed processing of the map and reduction operations. Provided each mapping operation is independent of the others, all maps can be performed in parallel– though in practice it is limited by the data source and/or the number of CPUs near that data. Similarly, a set of 'reducers' can perform the reduction phase– all that is required is that all outputs of the map operation that share the same key are presented to the same reducer, at the same time. While this process can often appear inefficient compared to algorithms that are more sequential, MapReduce can be applied to significantly larger datasets than "commodity" servers can handle– a largeserver farmcan use MapReduce to sort a petabyte of data in only a few hours. The parallelism also offers some possibility of recovering from partial failure of servers or storage during the operation: if one mapper or reducer fails, the work can be rescheduled– assuming the input data are still available.

http://en.wikipedia.org/wiki/MapReduce

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值