6.824 2021 Lecture 1: Introduction

最新推荐文章于 2025-03-23 19:57:31 发布

原创

最新推荐文章于 2025-03-23 19:57:31 发布 · 316 阅读

0 ·

CC 4.0 BY-SA版权

本文探讨了P2P和分布式的关系，指出分布式系统通过复制实现容错，并强调了在分布式环境中实现一致性、容错性和性能之间的权衡。重点介绍了MapReduce（MR）模型，解释了其工作原理，如数据本地化以减少网络使用，负载均衡策略，以及如何处理故障以确保高可用性和确定性。此外，文章还讨论了MR在处理大规模数据时面临的挑战和局限性，以及它对现代分布式计算的影响。

What is a distributed system?
  multiple cooperating computers
  storage for big web sites, MapReduce, peer-to-peer sharing, &c
  lots of critical infrastructure is distributed

P2P和分布式的关系？因为节点之间都是平等的？

Why do people build distributed systems?
  to increase capacity via parallelism
  to tolerate faults via replication
  to place computing physically close to external entities
  to achieve security via isolation

分布式能处理更多请求，所以增加了容量？

通过replication容错

因为是分布式，所以把一些计算设施放到某些entity附近？

安全，isolation？和分布式的关系是？

But:
  many concurrent parts, complex interactions
  must cope with partial failure
  tricky to realize performance potential

各节点之间需要通信，交互；

可能有partial failure

要实现很好的scale很难

Why take this course?
  interesting -- hard problems, powerful solutions
  used by real systems -- driven by the rise of big Web sites
  active research area -- important unsolved problems
  hands-on -- you'll build real systems in the labs

Course components:
  lectures
  papers
  two exams
  labs
  final project (optional)

Lectures:
  big ideas, paper discussion, and labs
  will be video-taped, available online

Papers:
  research papers, some classic, some new
  problems, ideas, implementation details, evaluation
  many lectures focus on papers
  please read papers before class!
  each paper has a short question for you to answer
  and we ask you to send us a question you have about the paper
  submit question&answer before start of lecture

Labs:
  goal: deeper understanding of some important techniques
  goal: experience with distributed programming
  first lab is due a week from Friday
  one per week after that for a while

Lab 1: MapReduce
Lab 2: replication for fault-tolerance using Raft
Lab 3: fault-tolerant key/value store
Lab 4: sharded key/value store

This is a course about infrastructure for applications.
  * Storage.
  * Communication.
  * Computation.

The big goal: abstractions that hide the complexity of distribution.

infrastructure！存储，通信，计算

目标是屏蔽分布式的技术细节（专注于业务，框架的目的其实都差不多）

Topic: fault tolerance
  1000s of servers, big network -> always something broken
    We'd like to hide these failures from the application.
  We often want:
    Availability -- app can make progress despite failures
    Recoverability -- app will come back to life when failures are repaired
  Big idea: replicated servers.
    If one server crashes, can proceed using the other(s).