目录
当在事务中读数据或者 prewrite
keys 时,如果 key 上已经有 Lock 了,这时就需要进行 ResolveLock
。为什么会出现这种情况?有以下几种情况:
- 事务 txn_1 在完成
prewrite
key_1,key_2 后就异常退出了,那么此时事务 txn_2 再去读 key_1, key_2 时,就会发现有 txn_1 在prewrite
时写的 Lock。 - 事务 txn_1 在
prewrite
key_1完成,但在 key_2 因为冲突而失败时,txn_1 会终止并异步清理 key_1 上的锁,如果异步清理锁还没完成,此时 txn_2 去读 key_1 ,也会遇到 Lock - 事务 txn_1 在
commit
primary key 成功后,是用异步commit
second keys,在异步commit
还没完成时,txn_2 去读 second keys 时也会遇到 Lock。
那么如何 ResolveLocks
呢?prewrite
keys 时会同时带上一个 LockTTL
, ResolveLock
的流程首先是检查所有 Lock 的 TTL,记下最久的 expire Time
,并发现如果有 keys 上的 locks 的 ttl 已经过期后,就会发起对这些过期 keys 的 Locks 进行 resolveLock
,如果还有 keys 的 locks 没有 resolve
,就根据最久的 expire time
进行 back off
后重试。
resolve
已经过期的 Locks 的流程如下 (resolveLock
函数):
- 用
getTxnStatusFromLock
查询 Lock key 的状态 - 如果 lock 以及过期,并根据 Lock 的状态发送
ResolveLockRequest
类型的请到 locked key 所在的region, 根据 lock 的状态执行 commit or rollback.
resolveLocks
TiDB侧,resolveLocks,里面调用resolveLocksPhysical
resolveLocksPhysical
注释:
// resolveLocksPhysical uses TiKV's `PhysicalScanLock` to scan stale locks in the cluster and resolve them. It tries to // ensure no lock whose ts <= safePoint is left.
使用tikv的PhysicalScanLock去扫描 stale locks 并resolve 它们。它试图确保没有 ts <= safepoint 的锁.
我们用伪代码表示这个函数
removeLockObservers
getStoresMapForGC
for(最多重试3次){
registerLockObservers
resolvedStores = physicalScanAndResolveLocks
stores = getStoresMapForGC
checkedStores = checkLockObservers
for( store : stores ){
if checkedStores(store) ok{
resolvedStores和dirtyStores ok的进行delete
}else if registeredStores(store) ok{
store 已经 registered了,因为因为collected locks太多,所以脏了。
回到传统模式。
我们不能删除lock observer并重试整个过程。因为如果存储在解析锁期间收到重复的删除和注册请求,检查时store会被清理,但是lock observer drop了一些lock。
这可能导致锁丢失。
}
if(dirtyStores == 0 ){
break;
}
}
//如果重试3次,dirtyStores还没清空
返回err:still has %d dirty stores after physical resolve locks
}
removeLockObservers
对应tikv侧remove_lock_observer 参数MaxTs: safePoint,
触发tikv侧LockCollectorTask::StopCollecting -> fn stop_collecting(
getStoresMapForGC
registerLockObservers
主要是构造CmdRegisterLockObserver,RegisterLockObserverRequest 请求tikv的register_lock_observer接口里面调用tikv gc_worker的start_collecting
fn的注释: /// Starts collecting applied locks whose `start_ts` <= `max_ts`. Only one `max_ts` is valid at one time.
开始收集 start_ts<=max_ts的applied locks;一次只有一个max_ts有效;
physicalScanAndResolveLocks
注释:
// physicalScanAndResolveLocks performs physical scan lock and resolves these locks. Returns successful stores
physicalScanAndResolveLocks 执行物理扫描lock并resolves 这些locks。返回成功的stores;
scanner = newMergeLockScanner(safepoint,tikv client,stores) 构造scanner
scanner.start(
遍历所有store,调用 physicalScanLocksForStore(ctx, s.safePoint, store1, ch)
对应tikv 侧 physical_scan_lock
内部gcworker->GcTask::PhysicalScanLock->handle_physical_scan_lock
)
for(stores){
resolveLocksAcrossRegions(ctx,lock)
}
tikv gc_worker.rs handle_physical_scan_lock
按 ts <= max_ts的filter,从CF_LOCKS扫描数据,解析为Lock对象
/// Scan locks that satisfies `filter(lock)` returns true, from the given start key `start`.
/// At most `limit` locks will be returned. If `limit` is set to `0`, it means unlimited.
///
/// The return type is `(locks, is_remain)`. `is_remain` indicates whether there MAY be
/// remaining locks that can be scanned.
reader.scan_locks
从start开始,扫描满足'filter(lock)'的锁,从给定的开始键'start'返回true。
最多返回'limit'锁。如果'limit'设置为'0',则表示无限。
返回类型为`(locks,is_remain)``is_remain`表示是否还有可以扫描的剩余锁。
filter条件:|l| l.ts <= max_ts
lock_cursor = CursorBuilder::new(&self.snapshot, CF_LOCK)
while(seek.next ){
Lock::parse(value:&[u8])
}
resolveLocksAcrossRegions内部请求tikv的ResolveLockRequest
后面再跟着ResolveLock
impl<S: Snapshot, L: LockManager> WriteCommand<S, L> for ResolveLock
里面有cleanup
ResolveLockLite
cleanup是"Cleanup the lock if it's TTL has expired"
/// Cleanup the lock if it's TTL has expired, comparing with `current_ts`. If `current_ts` is 0, /// cleanup the lock without checking TTL. If the lock is the primary lock of a pessimistic /// transaction, the rollback record is protected from being collapsed. /// /// Returns the released lock. Returns error if the key is locked or has already been /// committed.
cleanup ->check_txn_status_lock_exists->最后txn.unlock_key
checkLockObservers
请求tikv侧的check_lock_observer->gc_worker.get_collected_locks
注释:
// checkLockObservers checks the state of each store's lock observer. If any lock collected by the observers, resolve them. Returns ids of clean stores.
checkLockObservers 检查每个store 的lock observer的状态。如果任何lock 被observer收集到。resolve 他们。返回 clean stores 的 ids.
removeLockObservers
//结束的时候执行removeLockObservers
defer w.removeLockObservers(ctx, safePoint, registeredStores)
(Go语言的 defer 语句会将其后面跟随的语句进行延迟处理,在 defer 归属的函数即将返回时,将延迟处理的语句按 defer 的逆序进行执行,也就是说,先被 defer 的语句最后被执行,最后被 defer 的语句,最先被执行。)