MIT6.824 第一课MapReduce 以及实验思路

What is a distributed system?
  multiple cooperating computers
  storage for big web sites, MapReduce, peer-to-peer sharing, &c
  lots of critical infrastructure is distributed

 

Why do people build distributed systems?
  to increase capacity via parallelism
  to tolerate faults via replication
  to place computing physically close to external entities
  to achieve security via isolation

But:
  many concurrent parts, complex interactions
  must cope with partial failure
  tricky to realize performance potential

 

MAIN TOPICS

This is a course about infrastructure for applications.
  * Storage.
  * Communication.
  * Computation.

The big goal: abstractions that hide the complexity of distribution.
  A couple of topics will come up repeatedly in our search.

Topic: implementation
  RPC, threads, concurrency control.
  The labs...

Topic: performance
  The goal: scalable throughput
    Nx servers -> Nx total throughput via parallel CPU, disk, net.
    [diagram: users, application servers, storage servers]
    So handling more load only requires buying more computers.
      Rather than re-design by expensive programmers.
    Effective when you can divide work w/o much interaction.
  Scaling gets harder as N grows:
    Load im-balance, stragglers, slowest-of-N latency.
    Non-parallelizable code: initialization, interaction.
    Bottlenecks from shared resources, e.g. network.
  Some performance problems aren't easily solved by scaling
    e.g. quick response time for a single user request
    e.g. all users want to update the same data
    often requires better design rather than just more computers
  Lab 4

Topic: fault tolerance
  1000s of servers, big network -> always something broken
  We'd like to hide these failures from the application.
  We often want:
    Availability -- app can make progress despite failures
    Recoverability -- app will come back to life when failures are repaired
  Big idea: replicated servers.
    If one server crashes, can proceed using the other(s).
    Labs 1, 2 and 3

Topic: consistency
  General-purpose infrastructure needs well-defined behavior.
    E.g. "Get(k) yields the value from the most recent Put(k,v)."
  Achieving good behavior is hard!
    "Replica" servers are hard to keep identical.
    Clients may crash midway through multi-step update.
    Servers may crash, e.g. after executing but before replying.
    Network partition may make live servers look dead; risk of "split brain".
  Consistency and performance are enemies.
    Strong consistency requires communication,
      e.g. Get() must check for a recent Put().
    Many designs provide only weak consistency, to gain speed.
      e.g. Get() does *not* yield the latest Put()!
      Painful for application programmers but may be a good trade-off.

 

CASE STUDY: MapReduce

Let's talk about MapReduce (MR) as a case study
  a good illustration of 6.824's main topics
  hugely influential
  the focus of Lab 1

MapReduce overview
  context: multi-hour computations on multi-terabyte data-sets
    e.g. build search index, or sort, or analyze structure of web
    only practical with 1000s of computers
    applications not written by distributed systems experts
  overall goal: easy for non-specialist programmers
  programmer just defines Map and Reduce functions
    often fairly simple sequential code
  MR takes care of, and hides, all aspects of distribution!
  
Abstract view of a MapReduce job
  input is (already) split into M files
  Input1 -> Map -> a,1 b,1
  Input2 -> Map ->     b,1
  Input3 -> Map -> a,1     c,1
                    |   |   |
                    |   |   -> Reduce -> c,1
                    |   -----> Reduce -> b,2
                    ---------> Reduce
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值