两阶段提交 (two phase commit - 2PC)
分布式事务的实现方式之一就是 2PC
什么是 2PC transaction
2PC uses a new component that does not normally appear in single-node transactions: a coordinator (also known as transaction manager). The coordinator is often implemented as a library within the same application process that is requesting the transaction (e.g., embedded in a Java EE container), but it can also be a separate process or service.
A 2PC transaction begins with the application reading and writing data on multiple database nodes, as normal. We call these database nodes participants in the transaction. When the application is ready to commit, the coordinator begins phase 1: it sends a prepare request to each of the nodes, asking them whether they are able to commit. The coordinator then tracks the responses from the participants:
- If all participants reply “yes,” indicating they are ready to commit, then the coordinator sends out a commit request in phase 2, and the commit actually takes place.
- If any of the participants replies “no,” the coordinator sends an abort request to all nodes in phase 2.
示意图
为啥2PC 可以保证多节点的提交 atomicity
-
Application 开始分布式事务时,会向coordinator 申请一个 global unique 的transaction id.
-
Application 会在每个participants上执行单节点的transaction, 这些transaction 会关联同一个global 的transaction id. All reads and writes are done in one of these single-node transactions. If anything goes wrong at this stage (for example, a node crashes or a request times out), thecoordinator or any of the participants can abort.
-
当Application 可以去做commit 了,coordinator 会发送 preapre 请求到所有节点(participants)。 如果有任意请求失败或超时之类,coordinator 会发送 abort 请求到所有节点取消这个transaction.
-
当一个participant 接受到prepare 请求之后,它要确定在任何情况下都可以成功commit,那就可以回复Yes. This includes writing all transaction data to disk (a crash, a power failure, or running out of disk space is not an acceptable excuse for refusing to commit later), and checking for any conflicts or constraint violations. 一旦回复了Yes 给coordinator 就表示participant 承诺一旦接到commit 的请求,就会成功commit.
-
而当coordinator 接收了所有的response 之后,会做出最终要commit or abort 事务 的决定(committing only if all participants voted “yes”)。 Coordinator 也必须把这个决定写到在硬盘的transaction log 中, 这样就算coordinator 后面crash 了也知道这个transaction 的结果是 commit or abort. 这个写到硬盘的点就叫
commit point
. -
当 coordinator 把最终决定写到硬盘后,再把这个commit or abort 的决定发给所有的participants. 如果这个请求失败了,coordinator 会不停重试直到成功。 There is no more going back: if the decision was to commit, that decision must be enforced, no matter how many retries it takes. If a participant has crashed in the meantime, the transaction will be committed when it recovers—since the participant voted “Yes”, it cannot refuse to commit when it recovers.
这里2PC 协议有两个重要的 “points of no return”:
- 当一个participant/节点回复了 “Yes” 之后,这个participant 就不能反悔了,要必须可以成功commit, 不管你有什么情况。
- 当coordinator 做出了决定(decision), 这个决定(decision)也是不可以撤销的。
而单节点的transaction 是把这两个步骤合并在一起了。
Coordinator failure
在 coordinator 发送prepare reqeust之前,participant 可以安全地取消transaction. 但是一旦participant 接收到了prepare 并回复Yes 之后,participant 就只能等待 coordiantor 的下一步指令了。 如果coordinator crash 了或者网络不通了,participant 也做不了什么,只能等了。
A participant’s transaction in this state is called in doubt or uncertain
.
例如 coordinator 给 database-2 发送 commit 的request, 但是在给 database-1 发送commit 之前就crash 掉了。 这样database-1 就不知道应该是commit or abort 了。 即使是设置了timeout 也解决不了这个问题: 假如database-1 timeout 后abort 了,那它就和 database-2 不一致。 但是也不能commit 因为database-2 也可能是 abort 了。Without hearing from the coordinator, the participant has no way of knowing whether to commit or abort. In principle, the participants could communicate among themselves to find out how each participant voted and come to some agreement, but that is not part of the 2PC protocol.
这也是为什么coordinator 在发出决定(decision) 之前要先保存这个决定到硬盘。 when the coordinator recovers, it determines the status of all in-doubt transactions by reading its transaction log. Any transactions that don’t have a commit record in the coordinator’s log are aborted. Thus, the commit point of 2PC comes down to a regular single-node atomic commit on the coordinator(两阶段提交(2PC)的提交点最终简化/转变为 coordinator 节点上的一次常规单节点原子性提交).
Implementation
X/Open XA (short for eXtended Architecture) is a standard for implementing two phase commit across multiple different data systems.
XA is not a network protocol—it is merely a C API for interfacing with a transaction coordinator. Bindings for this API exist in other languages; for example, in the world of Java EE applications, XA transactions are implemented using the Java Transaction API (JTA), which in turn is supported by many drivers for databases using Java Database Connectivity (JDBC) and drivers for message brokers using the Java Message Service (JMS) APIs.
The transaction coordinator implements the XA API. The standard does not specify how it should be implemented, but in practice the coordinator is often simply a library that is loaded into the same process as the application issuing the transaction (not a separate service). It keeps track of the participants in a transaction, collects partipants’ responses after asking them to prepare (via a callback into the driver), and uses a log on the local disk to keep track of the commit/abort decision for each transaction.
If the application process crashes, or the machine on which the application is running dies, the coordinator goes with it. Since the coordinator’s log is on the application server’s local disk, that server must be restarted, and the coordinator library must read the log to recover the commit/abort outcome of each transaction.
The Limitations of the 2PC
为什么我们如此关注 participant 的 in doubt state 的 transaction 呢? 也就是回复了 Yes 但是还没有接收到 commit or abort 决定的 transaction. 因为 participant 在回复prepare 之前需要检查它一定可以commit,所以会给数据上锁。 用锁来保证数据满足各种constraints, 不被修改。
那如果没有收到coordinator 的决定,就会导致 participant 一直hold 住这个锁,这样就会阻塞后面的transaction 了。
While those locks are held, no other transaction can modify those rows. Depending on the database, other transactions may even be blocked from reading those rows.
Thus, other transactions cannot simply continue with their business—if they want to access that same data, they will be blocked. This can cause large parts of your application to become unavailable until the in-doubt transaction is resolved。
理论上来说,coordinator crashes 然后重启,是可以恢复并清除所有的 in-doubt transactions. 然而在实际中,确实是会因为各种原因导致 coordinator 恢复不了(e.g., because the transaction log has been lost or corrupted due to a software bug).
Even rebooting your database servers will not fix this problem, since a correct implementation of 2PC must preserve the locks of an in-doubt transaction even across restarts (otherwise it would risk violating the atomicity guarantee). It’s a sticky situation.
The only way out is for an administrator to manually decide whether to commit or roll back the transactions.
- 引入了人工恢复的复杂性
- Coordinator 的实现也是需要有高可用性, 需要有backup 之类的
- 需要所有的参与的系统都支持
Reference
Book < Designing data intensive applications >