MIT 6.824 Lab1 MapReduce

最新推荐文章于 2024-06-16 17:22:24 发布

寒冰陨云

最新推荐文章于 2024-06-16 17:22:24 发布

阅读量664

点赞数 2

分类专栏： MIT6.824分布式系统文章标签： mapreduce hadoop 大数据

本文链接：https://blog.youkuaiyun.com/weixin_46840831/article/details/121544638

版权

这篇博客详细介绍了MIT 6.824 Lab1中MapReduce的基本实现，包括Coordinator和Worker的数据结构及功能。Worker注册、任务请求与执行过程被详细阐述，同时讨论了宕机处理策略，如何避免任务重复执行的问题。完整代码和实验详情可在MIT-6.24 lab代码库中找到。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

概述

本文章主要讲述lab1的基本实现思路，具体的实验要求见MIT-Lab1。实验代码见Lab代码。

基本需求

一个Coordinator管理多个Worker，通过RPC进行通信
Worker向Corrdinator请求任务，Coordinator向Worker分配任务
Coordinator能够处理Worker Crash

基本数据结构

Coordinator

type Coordinator struct {
   
	// Your definitions here.
	nReduce     int
	nMap        int
	workerLists sync.Map
	startReduce chan bool

	// MapTask
	muMapTask       sync.Mutex
	mapTaskNeedExec int
	mapTaskLists    []*MapTask
	mapTaskQueue    chan *MapTask

	// ReduceTask
	muReduceTask       sync.Mutex
	reduceTaskNeedExec int
	reduceTaskLists    []*ReduceTask
	reduceTaskQueue    chan *ReduceTask
}

workerLists用来管理Worker所有Worker的状态，mapTaskQueue和reduceTaskQueue为并发队列，用于Worker并发获取任务，mapTaskLists和reduceTaskLists用于存储所有的Task。

Worker

type worker struct {
   
	id       string
	nReduce  int
	needExit chan bool
}

needExit同于判断当前Worker是否可以退出，即所有任务已经完成。

具体功能

Worker注册

每个Worker新加入集群时，都要向Coordinator发起注册，Coordinator收到注册请求后，会进行合法性判断，如果合法则加入到workerLists中

// worker.go
func (w *worker) register() {
   
	w.id = strconv.Itoa(os.Getpid())
	reply := RegisterReply{
   }
	args := RegisterArgs{
   WorkerID: w.id}
	call("Coordinator.Register", &args, &reply)
	w.nReduce = reply.ReduceNum
}

// coordinator.go
func (c *Coordinator) Register(args *RegisterArgs, reply *RegisterReply) error {
   
	workerID := args.WorkerID
	_, exist := c.workerLists.Load(workerID)

	if exist {
   
		return errors.New(ErrDuplicateWorker)
	}
	reply.ReduceNum = c.nReduce
	worker := workerRecord

最低0.47元/天解锁文章