Google如何处理海量数据

MapReduce是Google为处理海量数据而设计的一种编程模型,它将数据处理任务分解为Map和Reduce两个阶段。Map操作处理键值对生成中间键值对,Reduce操作则合并相同中间键的所有中间值。该模型适用于并行化执行,运行时系统自动处理数据分区、任务调度、机器故障管理和通信管理,使得没有分布式系统经验的程序员也能利用大型分布式系统资源。Google的MapReduce实现能在数千台机器上处理数TB的数据,并且每天执行上千个MapReduce作业。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

MapReduce 是google发明的处理海量数据的一种方法。从本质上说,MapReduce 是在分布式环境下把海量数据的处理分解为多个小计算任务的过程,主要包括两个部分: 1、Map 操作: 2、Reduce 操作

以下是原文:

MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat

Abstract

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.

Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

Appeared in:
OSDI'04: Sixth Symposium on Operating System Design and Implementation,
San Francisco, CA, December, 2004.

Download: PDF Version

Slides: HTML Slides

Google_Uiok 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值