MinIO对象读写中的分布式锁

原创已于 2023-12-05 23:05:24 修改 · 568 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#分布式 #golang

于 2023-12-04 22:31:01 首次发布

Go语言同时被 2 个专栏收录

8 篇文章

订阅专栏

MinIO

2 篇文章

订阅专栏

本文详细介绍了Minio对象存储系统中的分布式锁机制，包括锁接口定义、本地和远程实现、相关结构体如erasureSets、erasureObjects和DRWMutex，以及锁的初始化、创建和解锁操作。

一、背景

minio在对象读写和删除的操作中，可以指定是否需要加锁，用于保证对象读写的互斥逻辑。minio默认打开对象读写锁。本文不会详细讨论锁具体的实现细节，只是从流程上梳理锁的初始化和使用。

二、分布式锁接口

2.1 接口定义

minio用一个抽象接口来定义锁的行为。接口定义在minio的internal/dsync/locker.go文件中。具体接口定义如下。方法的用途和详细注释，请参考minio的源代码。

type NetLocker interface {
	RLock(ctx context.Context, args LockArgs) (bool, error)
	Lock(ctx context.Context, args LockArgs) (bool, error)
	RUnlock(ctx context.Context, args LockArgs) (bool, error)
	Unlock(ctx context.Context, args LockArgs) (bool, error)
	Refresh(ctx context.Context, args LockArgs) (bool, error)
	ForceUnlock(ctx context.Context, args LockArgs) (bool, error)
	String() string
	Close() error
	IsOnline() bool
	IsLocal() bool
}

2.2 接口实现类

虽然minio在所有节点都可以进行对象读写操作，但对单个对象的读写操作只能通过一个节点进行。例如对象A要写入一个3节点的集群，只能通过其中一个节点对A对象进行纠删处理，再将对象数据（含纠删数据）切割为多份后并行写入到各个节点。假设A对象通过3节点集群的1号节点进行写入，对A对象来说，1号节点就是local节点，2、3号节点就是remote节点。因此minio用本地和远程两个结构体来实现分布式锁的接口。

编号	结构体	说明
1	localLocker	用于保存本地对象锁信息
2	lockRESTClient	用于保存远程对象锁服务接口的client

三、相关结构体

3.1 erasureSets

erasureSets结构体用于保存单个pool中所有set的信息。结构体中的erasureLockers字段是一个二维的slice，按set编号保存每个set的所有的locker。erasureLockers字段在newErasureSets函数中被初始化。具体代码如下。

// Initialize the erasure sets instance.
s := &erasureSets{
	sets:               make([]*erasureObjects, setCount),
	erasureDisks:       make([][]StorageAPI, setCount),
	erasureLockers:     make([][]dsync.NetLocker, setCount),
	erasureLockOwner:   globalLocalNodeName,
	endpoints:          endpoints,
	endpointStrings:    endpointStrings,
	setCount:           setCount,
	setDriveCount:      setDriveCount,
	defaultParityCount: defaultParityCount,
	format:             format,
	setReconnectEvent:  make(chan int),
	distributionAlgo:   format.Erasure.DistributionAlgo,
	deploymentID:       uuid.MustParse(format.ID),
	poolIndex:          poolIdx,
}

erasureSets结构体中的sets字段是一个slice，按set编号保存了erasureObjects结构体实例的指针。erasureObjects结构体实例为指定的set提供对象操作（读写删等）的能力。

erasureSets结构体提供了GetLockers方法，用于获取指定set的所有locker。该方法将set所保有的locker的指针拷贝到新的切片里面并返回该切片。具体代码如下。

// GetLockers 传入set索引，通过copy方式返回指定set的所保有的locker的切片
//  set跨几个节点就有几个locker，locker分local和dist两种。
func (s *erasureSets) GetLockers(setIndex int) func() ([]dsync.NetLocker, string) {
	return func() ([]dsync.NetLocker, string) {
		lockers := make([]dsync.NetLocker, len(s.erasureLockers[setIndex]))
		copy(lockers, s.erasureLockers[setIndex])
		return lockers, s.erasureLockOwner
	}
}

3.2 erasureObjects

erasureObjects结构体实例提供了操作对象的能力。erasureObjects在实例化的时候，将erasureSets的GetLockers方法赋值给构体中getLockers字段。在对对象进行操作时，通过执行getLockers方法获取这个set的所有locker。erasureObjects结构体在newErasureSets函数中进行初始化，具体代码如下。

// Initialize erasure objects for a given set.
// 设置单个set的信息，包括set的索引，set所在pool的索引，set中的磁盘数量，
// getDisks获取所有磁盘的StorageAPI接口的实现
// getLockers获取当前set所有locker
s.sets[i] = &erasureObjects{
	setIndex:              i,
	poolIndex:             poolIdx,
	setDriveCount:         setDriveCount,
	defaultParityCount:    defaultParityCount,
	getDisks:              s.GetDisks(i),
	getLockers:            s.GetLockers(i),
	getEndpoints:          s.GetEndpoints(i),
	deletedCleanupSleeper: newDynamicSleeper(10, 2*time.Second),
	nsMutex:               mutex,
	bp:                    bp,
	bpOld:                 bpOld,
}

erasureObjects结构体的nsMutex字段用于保存nsLockMap结构体实例的指针。在newErasureSets函数中通过newNSLock函数创建。newNSLock函数创建nsLockMap结构体实例并返回实例的指针。也就是说每个set的erasureObjects实例中的nsMutex字段都指向同一个nsLockMap结构体实例。

3.3 nsLockMap

nsLockMap结构体的提供了NewNSLock方法，该方法在分布式集群的场景下返回distLockInstance结构体实例的指针。distLockInstance结构体实现了RWLocker接口。RWLocker接口和distLockInstance结构体具体代码如下。

// RWLocker - locker interface to introduce GetRLock, RUnlock.
type RWLocker interface {
	GetLock(ctx context.Context, timeout *dynamicTimeout) (lkCtx LockContext, timedOutErr error)
	Unlock(cancel context.CancelFunc)
	GetRLock(ctx context.Context, timeout *dynamicTimeout) (lkCtx LockContext, timedOutErr error)
	RUnlock(cancel context.CancelFunc)
}

type distLockInstance struct {
	rwMutex *dsync.DRWMutex
	opsID   string
}

调用NewNSLock方法时会随机生成一个UUID并将这个UUID赋值给opsID字段。

rwMutex字段保存的是dsync.DRWMutex结构体的指针，dsync.NewDRWMutex函数生成。

3.4 DRWMutex & Dsync

DRWMutex是一个用于分布式互斥锁的结构体。具体代码如下。

type DRWMutex struct {
	Names         []string
	writeLocks    []string // Array of nodes that granted a write lock
	readLocks     []string // Array of array of nodes that granted reader locks
	rng           *rand.Rand
	m             sync.Mutex // Mutex to prevent multiple simultaneous locks from this node
	clnt          *Dsync
	cancelRefresh context.CancelFunc
}

// Dsync represents dsync client object which is initialized with
// authenticated clients, used to initiate lock REST calls.
type Dsync struct {
	// List of rest client objects, one per lock server.
	GetLockers func() ([]NetLocker, string)
}

DRWMutex结构体最重要的两个字段，Names字段保存对象的路径，clnt字段保存Dsync结构体的指针。

Dsync结构体只有一个方法类型的字段GetLokers。在实例化Dsync结构体时，实际会传入erasureSets的GetLocker方法。

四、锁相关操作

4.1 锁的初始化

在minio服务启动时，会调用newErasureSets函数对锁进行初始化。具体如下代码。

erasureLockers := map[string]dsync.NetLocker{}
for _, endpoint := range endpoints.Endpoints {
	if _, ok := erasureLockers[endpoint.Host]; !ok {
		erasureLockers[endpoint.Host] = newLockAPI(endpoint)
	}
}

通过遍历单个pool的所有endpoint来生成所有节点的locker。注意是一个节点一个locker，而不是一个endpoint一个locker。临时变量erasureLockers保存了所有节点的locker实例的指针。

newLockAPI函数根据endpoint中的是否是本地节点标识来生成对应的locker对象并返回对象的指针。代码如下。

func newLockAPI(endpoint Endpoint) dsync.NetLocker {
	if endpoint.IsLocal {
		return globalLockServer
	}
	return newlockRESTClient(endpoint)
}

globalLockServer是在minio服务启动过程中注册路由的过程中创建的，保存的是localLocker结构体对象的指针。newlockRESTClient函数根据endpoint中的host创建lockRESTClient结构体对象并返回对象的指针。

在获取了所有节点的locker后，再通过下面的代码按set保存对应的locker。

for i := 0; i < setCount; i++ {
		lockerEpSet := set.NewStringSet()
		for j := 0; j < setDriveCount; j++ {
			endpoint := endpoints.Endpoints[i*setDriveCount+j]
			// Only add lockers only one per endpoint and per erasure set.
			if locker, ok := erasureLockers[endpoint.Host]; ok && !lockerEpSet.Contains(endpoint.Host) {
				lockerEpSet.Add(endpoint.Host)
				s.erasureLockers[i] = append(s.erasureLockers[i], locker)
			}
		}
	}

上述代码中，s是erasureSets结构体实例的指针，erasureLockers数据项是一个二维的slice，按set保存对应的locker的指针。erasureLockers数据项和上面的erasureLockers临时变量具有相同的名字，请务必不要混淆。

4.2 创建分布式锁

在进行对象操作时，调用erasureObjects结构体的NewNSLock方法来创建分布式锁。实际就是创建了一个distLockInstance结构体的实例并返回指针。由于distLockInstance结构体实现了RWLocker接口，所以可以加解锁。

erasureObjects结构体的NewNSLock方法实际调用了erasureObjects结构体字段nsMutex的NewNSLock方法，并传入了erasureObjects结构体的getLockers字段。实际就是传入了erasureSets的GetLockers方法。通过erasureSets的GetLockers方法可以获取当前set中所有节点的locker（locakLocker或lockRESTClient结构体实例的指针）。

4.3 加解分布式锁

创建distLockInstance结构体的实例后，可以调用distLockInstance结构体的GetLock或GetRLock实现加锁。在GetLock和GetRLock方法内部，调用的distLockInstance结构体的rwMutex字段的GetLock和GetRLock方法。rwMutex字段的GetLock和GetRLock方法内部又调用来DRWMutex结构体的lockBlocking方法来进行加锁，lockBlocking方法中调用的lock函数就是最终控制加锁的函数。

lock函数根据传入的locker并发的调用NetLocker接口的方法。如果是当前节点的locker，实际调用localLocker的Lock或RLock方法。非当前节点的locker，实际调用lockRESTClient的Lock或RLock方法。lockRESTClient的Lock或RLock方法通过rest api接口，调用的也是指定节点的localLocker的Lock或RLock方法。

解分布式锁的处理流程和加锁的流程一致。

详细的代码注释：

GitHub - luo2pei4/minio: High Performance, Kubernetes Native Object Storage