What is a distributed system? multiple cooperating computers storage for big web sites, MapReduce, peer-to-peer sharing, &c lots of critical infrastructure is distributed
P2P和分布式的关系?因为节点之间都是平等的?
Why do people build distributed systems? to increase capacity via parallelism to tolerate faults via replication to place computing physically close to external entities to achieve security via isolation
分布式能处理更多请求,所以增加了容量?
通过replication容错
因为是分布式,所以把一些计算设施放到某些entity附近?
安全,isolation?和分布式的关系是?
But: many concurrent parts, complex interactions must cope with partial failure tricky to realize performance potential
各节点之间需要通信,交互;
可能有partial failure
要实现很好的scale很难
Why take this course? interesting -- hard problems, powerful solutions used by real systems -- driven by the rise of big Web sites active research area -- important unsolved problems hands-on -- you'll build real systems in the labs
Course components: lectures papers two exams labs final project (optional)
Lectures: big ideas, paper discussion, and labs will be video-taped, available online
Papers: research papers, some classic, some new problems, ideas, implementation details, evaluation many lectures focus on papers please read papers before class! each paper has a short question for you to answer and we ask you to send us a question you have about the paper submit question&answer before start of lecture
Labs: goal: deeper understanding of some important techniques goal: experience with distributed programming first lab is due a week from Friday one per week after that for a whileLab 1: MapReduce Lab 2: replication for fault-tolerance using Raft Lab 3: fault-tolerant key/value store Lab 4: sharded key/value store
This is a course about infrastructure for applications. * Storage. * Communication. * Computation.The big goal: abstractions that hide the complexity of distribution.
infrastructure!存储,通信,计算
目标是屏蔽分布式的技术细节(专注于业务,框架的目的其实都差不多)
Topic: fault tolerance 1000s of servers, big network -> always something broken We'd like to hide these failures from the application. We often want: Availability -- app can make progress despite failures Recoverability -- app will come back to life when failures are repaired Big idea: replicated servers. If one server crashes, can proceed using the other(s).

本文探讨了P2P和分布式的关系,指出分布式系统通过复制实现容错,并强调了在分布式环境中实现一致性、容错性和性能之间的权衡。重点介绍了MapReduce(MR)模型,解释了其工作原理,如数据本地化以减少网络使用,负载均衡策略,以及如何处理故障以确保高可用性和确定性。此外,文章还讨论了MR在处理大规模数据时面临的挑战和局限性,以及它对现代分布式计算的影响。
最低0.47元/天 解锁文章
634

被折叠的 条评论
为什么被折叠?



