理解CAP理论

翻译自 https://dzone.com/articles/understanding-the-cap-theorem

Understanding the CAP Theorem

理解CAP原理

In this article, we take an exploratory look at one of the more important ideas in the field of data engineering, and where it stands today.

在这篇文章,我们探索性地看看在数据工程领域的其中一个最重要的概念,还有它今天的情形。

The CAP theorem is a tool used to makes system designers aware of the trade-offs while designing networked shared-data systems. CAP has influenced the design of many distributed data systems. It made designers aware of a wide range of tradeoffs to consider while designing distributed data systems. Over the years, the CAP theorem has been a widely misunderstood tool used to categorize databases. There is much misinformation floating around about CAP. Most blog posts on CAP are historical and possibly incorrect.

CAP原理是使一个系统设计者理解设计基于网络的数据共享系统的权衡因素的工具。CAP影响了许多分布式数据系统的设计。它使设计者了解设计分布式数据系统的时候的需要考虑的各种权衡因素。经过这些年,CAP原理已经成为被广泛误解的给数据库分类的工具。有很多的误解环绕在CAP的周围。许多关于CAP的博客文章是过时的和可能不正确的。

It is important to understand CAP so that you can identify the misinformation around it.

理解CAP对于你识别关于它的误解很重要。

The CAP theorem applies to distributed systems that store state. Eric Brewer, at the 2000 Symposium on Principles of Distributed Computing (PODC), conjectured that in any networked shared-data system there is a fundamental trade-off between consistency, availability, and partition tolerance. In 2002, Seth Gilbert and Nancy Lynch of MIT published a formal proof of Brewer’s conjecture. The theorem states that networked shared-data systems can only guarantee/strongly support two of the following three properties:

CAP理论应用在存储状态的分布式系统上。Eric Brewer 在2000年的关于分布计算的原则的研讨会上推测在任何基于网络的共享数据的系统中,存在一个基本的一致性、可用性和分区容错性的权衡。在2002年,麻省理工的Seth Gilbert和Nancy Lynch发布了对Brewer的推测的正式证明。这个理论描述了在基于网络的数据共享系统中只能保证或严格支撑下面三个属性中的两个:

Consistency - A guarantee that every node in a distributed cluster returns the same, most recent, successful write. Consistency refers to every client having the same view of the data. There are various types of consistency models. Consistency in CAP (used to prove the theorem) refers to linearizability or sequential consistency, a very strong form of consistency.

一致性 – 一个分布式集群中的每个节点返回相同的、最近的、成功的写入的保证。一致性是说每一个客户对数据具有相同的视图。有多种一致性模型存在。CAP中的一致性(被用来证明这个理论)是指线性的或者说顺序的一致性,一种非常强类型的一致性。

Availability - Every non-failing node returns a response for all read and write requests in a reasonable amount of time. The key word here is every. To be available, every node on (either side of a network partition) must be able to respond in a reasonable amount of time.

可用性 – 每一个非失败节点在合理时间内返回所有读取和写入的请求。这里的关键词是“每一个”。每一个节点(网络分区的任何一边)必须在合理的时间内相应,这样才叫可用。

Partition Tolerant - The system continues to function and upholds its consistency guarantees in spite of network partitions. Network partitions are a fact of life. Distributed systems guaranteeing partition tolerance can gracefully recover from partitions once the partition heals.

分区容错 – 在网络分区的情况下,系统继续工作和维持它的一致性保证。网络分区是客观事实。一旦分区治愈,有分区容错的分布式系统能够优雅地从分区中恢复。

The C and A in ACID represent different concepts than C and in A in the CAP theorem.

ACIDACID (Atomicity, Consistency, Isolation, Durability)中的C和A和CAP理论中的C和A是不同的概念。

The CAP theorem categorizes systems into three categories:

CAP理论吧系统分为3类:

CP (Consistent and Partition Tolerant) - At first glance, the CP category is confusing, i.e., a system that is consistent and partition tolerant but never available. CP is referring to a category of systems where availability is sacrificed only in the case of a network partition.

CP – 乍看CP很令人迷惑,就是说一个系统是一至的和分区容错的但是不可用。CP实际是指当网络分区存在时牺牲可用性这样的一类系统。

CA (Consistent and Available) - CA systems are consistent and available systems in the absence of any network partition. Often a single node’s DB servers are categorized as CA systems. Single node DB servers do not need to deal with partition tolerance and are thus considered CA systems. The only hole in this theory is that single node DB systems are not a network of shared data systems and thus do not fall under the preview of CAP. [^11]

CA – 在没有网络分区的情况,CA系统是一致和可用的系统。经常只有一个节点的数据库服务器被分类为CA系统。单节点数据库服务器不需要处理分区容错,所以被认为是CA系统。这个理论的唯一漏洞是单节点数据库系统不是一个网络共享数据系统,因此不满足CAP系统的前提。

AP (Available and Partition Tolerant) - These are systems that are available and partition tolerant but cannot guarantee consistency.

AP – 这类系统是可用的和分区容错的,但是不能够保证一致性。

A Venn diagram or a triangle is frequently used to visualize the CAP theorem. Systems fall into the three categories that depicted using the intersecting circles. Visualising The CAP Theorem

文氏图或者一个三角形经常被用来可视化CAP理论。属于这三种类型的系统用相交的圆圈来表示。

The part where all three sections intersect is white because it is impossible to have all three properties in networked shared-data systems. A Venn diagram or a triangle is an incorrect visualization of the CAP. Any CAP theorem visualization such as a triangle or a Venn diagram is misleading. The correct way to think about CAP is that in case of a network partition (a rare occurrence) one needs to choose between availability and partition tolerance. In any networked shared-data systems partition tolerance is a must. Network partitions and dropped messages are a fact of life and must be handled appropriately. Consequently, system designers must choose between consistency and availability. Simplistically speaking, a network partition forces designers to either choose perfect consistency or perfect availability. Picking consistency means not being able to answer a client’s query as the system cannot guarantee to return the most recent write. This sacrifices availability. Network partition forces nonfailing nodes to reject clients’ requests as these nodes cannot guarantee consistent data. At the opposite end of the spectrum, being available means being able to respond to a client’s request but the system cannot guarantee consistency, i.e., the most recent value written. Available systems provide the best possible answer under the given circumstance.

三个部分相交的部分是白色的,因为在网络共享数据系统中不可能满足3个条件。文氏图或三角形来表示CAP理论是不正确的。任何用文氏图或三角形来展现CAP理论的方式都是误导的。正确地理解CAP的方法是在网络分区(很少出现的场景)发生的情况下,我们需要在可用性和分区容错性中做取舍。在任何的网络共享数据系统中分区容错性都是必须的。网络分区和丢失的消息是生活的现实必须被合适地处理。结果就是,系统设计者必须在一致性和可用性之间做出选择。简单地说,网络分区逼迫设计者或者选择完美的一致性,或者选择完美的可用性。选择了一致性意味着不能回复一个客户的查询,因为系统不能够保证返回最近的写入。这牺牲了可用性。网络分区逼迫没有失败的节点拒绝客户请求,因为这些节点不能够保证数据一致性。在光谱的另一面,可用意味着系统能够响应客户请求,但是系统不能够保证数据一致性,也就是返回最近被写入的数据。维持可用性的系统在特定情况下提供最好可能的响应。

During normal operation (lack of network partition) the CAP theorem does not impose constraints on availability or consistency.

在正常的操作中(没有发生网络分区的情况)CAP理论对可用性和一致性没有限制。

The CAP theorem is responsible for instigating the discussion about the various tradeoffs in a distributed shared data system. It has played a pivotal role in increasing our understanding of shared data systems. Nonetheless, the CAP theorem is criticized for being too simplistic and often misleading. Over a decade after the release of the CAP theorem, Brewer acknowledges that the CAP theorem oversimplified the choices available in the event of a network partition. According to Brewer, the CAP theorem prohibits only a “tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare.” System designers have a broad range of options for dealing and recovering from network partitions. The goal of every system must be to “maximize combinations of consistency and availability that make sense for the specific application.”

CAP理论促进了分布式共享数据系统的不同权衡的讨论。它在增加我们对共享数据系统的理解上发挥了枢纽的作用。然而,CAP理论被批评为太简单和经常产生误导。经过超过10年CAP理论的提出,Brewer承认CAP理论过度简化了网络分区存在的情况下的可选方案。根据Brewer,CAP理论知识禁止一小部分的设计空间:在很少出现的分区存在的情况下的完美的可用性和一致性。系统设计者有很多的从网络分区情况恢复的候选方案。每一个系统的目标必须是为特定的应用最大化有意义的一致性和可用性。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值