SQLite为了实现事务的原子提交, 构造了4种类型的锁.这里说的锁指的是SQLite层面的锁,因为SQLite的这些锁需要底层操作系统的锁原语的支持,不同操作系统或者同一类操作系统的不同版本之间,锁原语的实现可能存在差别.在用户进程的角度,SQLite锁需要尽量屏蔽底层不同操作系统锁原语的差异.本文主要讨论SQLite锁机制以及底层POSIX操作系统的锁原语,不涉及Windows操作系统的锁原语的分析.
在SQLite中,底层POSIX操作系统的锁原语结构为(定义在fcntl.h头文件中):
struct flock
{
short int l_type; /* Type of lock: F_RDLCK, F_WRLCK, or F_UNLCK. */
short int l_whence; /* Where `l_start' is relative to (like `lseek'). */
#ifndef __USE_FILE_OFFSET64
__off_t l_start; /* Offset where the lock begins. */
__off_t l_len; /* Size of the locked area; zero means until EOF. */
#else
__off64_t l_start; /* Offset where the lock begins. */
__off64_t l_len; /* Size of the locked area; zero means until EOF. */
#endif
__pid_t l_pid; /* Process holding the lock. */
};
对于底层POSIX操作系统提供的锁,需要注意的是:
- 这是建议性的锁,不是强制锁.需要调用者用一致的方式使用建议性锁.
- 结构flock中的字段l_type为锁的类型,在上锁状态下,分为读锁和写锁.多个持有读锁的进程可以并发进行读操作,其他进程也可以获得新的读锁.在给定时间,只有一个进程持有写锁,具有排他性.
- 锁结构flock中的字段l_whence,l_start和l_len给定了锁操作的范围.这个范围不一定存在于文件长度范围内.
可以使用fcntl系统调用查询或者设置锁状态,fcntl系统调用操作锁的命令有:
# define F_GETLK 5 /* Get record locking info. */
# define F_SETLK 6 /* Set record locking info (non-blocking). */
# define F_SETLKW 7 /* Set record locking info (blocking). */
SQLite基于底层的锁原语设计了4种锁,分别是:
- SHARED_LOCK 这是读操作锁, 多个进程可以同时持有该类型的锁,实现并发读操作.
- RESERVED_LOCK 这是写操作锁,在给定时间内,只有一个进程可以持有该类型的锁.但是已经持有SHARED_LOCK读锁的其他进程,仍然可以进行读操作,其他进程也可以获得新的SHARED_LOCK读锁.
- PENDING_LOCK 这是写操作锁,其他持有SHARED_LOCK锁的进程可以继续进行读操作,但是该锁将阻止其他进程获取新的SHARED_LOCK锁.
- EXCLUSIVE_LOCK 这是写操作锁,在给定时间内,只有一个进程持有EXCLUSIVE_LOCK锁,这将阻止其他进程获取任何类型的锁.也就是说,获得该锁的前提是,其他进程没有持有任何锁.
对于SQLite层面的这4种类型的的锁,有2点需要说明的是:
- 这4种锁依照严格性是依次递增的, 锁之间的状态迁移是有一定顺序的,也是按照严格性进行依次状态迁移的.存在如下几种状态迁移的路径:
UNLOCKED -> SHARED
SHARED -> RESERVED
SHARED -> (PENDING) -> EXCLUSIVE
RESERVED -> (PENDING) -> EXCLUSIVE
PENDING -> EXCLUSIVE
- PENDING_LOCK类型的锁,只是一种中间状态的锁.SQLite的接口函数
sqlite3OsLock不会主动迁移到该锁状态.PENDING_LOCK状态的锁是由于在主动迁移到EXCLUSIVE状态时,而不能立即获得EXCLUSIVE锁时,暂时迁移到该状态.
为了在SQLite层面实现这4种类型的锁,SQLite定义了锁操作的范围(os.h头文件):
337 #ifndef SQLITE_TEST
338 #define PENDING_BYTE 0x40000000 /* First byte past the 1GB boundary */
339 #else
340 extern unsigned int sqlite3_pending_byte;
341 #define PENDING_BYTE sqlite3_pending_byte
342 #endif
343
344 #define RESERVED_BYTE (PENDING_BYTE+1)
345 #define SHARED_FIRST (PENDING_BYTE+2)
346 #define SHARED_SIZE 510
对于这个锁范围的定义,需要注意的是:
- RESERVED_LOCK锁操作的是偏移RESERVED_BYTE的一个字节.PENDING_LOCK锁操作的是偏移PENDING_BYTE的一个字节.SHARED_LOCK锁和EXCLUSIVE_LOCK锁都是操作的是偏移SHARED_FIRST的510个字节.
- 在POSIX操作系统上,SHARED_LOCK锁和EXCLUSIVE_LOCK锁也是可以操作一个字节的,但是在Windows操作系统老的版本中,操作相同字节的读锁会发生互斥,因此为了兼容老的Windows版本的锁原语, 将读锁的范围定义的足够大,保证了读操作的并发性.
- 锁操作的范围使用了512个字节(1字节的PENDING_BYTE锁,1字节的RESERVED_BYTE锁,510字节的SHARED_LOCK锁和EXCLUSIVE_LOCK锁),锁操作的范围在一个页面大小中.锁操作的起始地址是PENDING_BYTE, 如果文件足够大,并且因为锁操作的范围不存储实际的数据,因此文件中将存在一段空洞.
SQLite在操作回滚日志文件和数据库文件的逻辑流程中,都需要使用到锁.SQLite 源码中使用POSIX操作系统锁原语进行锁状态迁移的函数为unixLock:
1365 static int unixLock(OsFile *id, int locktype){
1366 /* The following describes the implementation of the various locks and
1367 ** lock transitions in terms of the POSIX advisory shared and exclusive
1368 ** lock primitives (called read-locks and write-locks below, to avoid
1369 ** confusion with SQLite lock names). The algorithms are complicated
1370 ** slightly in order to be compatible with windows systems simultaneously
1371 ** accessing the same database file, in case that is ever required.
1372 **
1373 ** Symbols defined in os.h indentify the 'pending byte' and the 'reserved
1374 ** byte', each single bytes at well known offsets, and the 'shared byte
1375 ** range', a range of 510 bytes at a well known offset.
1376 **
1377 ** To obtain a SHARED lock, a read-lock is obtained on the 'pending
1378 ** byte'. If this is successful, a random byte from the 'shared byte
1379 ** range' is read-locked and the lock on the 'pending byte' released.
1380 **
1381 ** A process may only obtain a RESERVED lock after it has a SHARED lock.
1382 ** A RESERVED lock is implemented by grabbing a write-lock on the
1383 ** 'reserved byte'.
1384 **
1385 ** A process may only obtain a PENDING lock after it has obtained a
1386 ** SHARED lock. A PENDING lock is implemented by obtaining a write-lock
1387 ** on the 'pending byte'. This ensures that no new SHARED locks can be
1388 ** obtained, but existing SHARED locks are allowed to persist. A process
1389 ** does not have to obtain a RESERVED lock on the way to a PENDING lock.
1390 ** This property is used by the algorithm for rolling back a journal file
1391 ** after a crash.
1392 **
1393 ** An EXCLUSIVE lock, obtained after a PENDING lock is held, is
1394 ** implemented by obtaining a write-lock on the entire 'shared byte
1395 ** range'. Since all other locks require a read-lock on one of the bytes
1396 ** within this range, this ensures that no other locks are held on the
1397 ** database.
1398 **
1399 ** The reason a single byte cannot be used instead of the 'shared byte
1400 ** range' is that some versions of windows do not support read-locks. By
1401 ** locking a random byte from a range, concurrent SHARED locks may exist
1402 ** even if the locking primitive used is always a write-lock.
1403 */
1404 int rc = SQLITE_OK;
1405 unixFile *pFile = (unixFile*)id;
1406 struct lockInfo *pLock = pFile->pLock;
1407 struct flock lock;
1408 int s;
1409
1410 assert( pFile );
1411 OSTRACE7("LOCK %d %s was %s(%s,%d) pid=%d\n", pFile->h,
1412 locktypeName(locktype), locktypeName(pFile->locktype),
1413 locktypeName(pLock->locktype), pLock->cnt , getpid());
1414
1415 /* If there is already a lock of this type or more restrictive on the
1416 ** OsFile, do nothing. Don't use the end_lock: exit path, as
1417 ** sqlite3OsEnterMutex() hasn't been called yet.
1418 */
1419 if( pFile->locktype>=locktype ){
1420 OSTRACE3("LOCK %d %s ok (already held)\n", pFile->h,
1421 locktypeName(locktype));
1422 return SQLITE_OK;
1423 }
1424
1425 /* Make sure the locking sequence is correct
1426 */
1427 assert( pFile->locktype!=NO_LOCK || locktype==SHARED_LOCK );
1428 assert( locktype!=PENDING_LOCK );
1429 assert( locktype!=RESERVED_LOCK || pFile->locktype==SHARED_LOCK );
1430
1431 /* This mutex is needed because pFile->pLock is shared across threads
1432 */
1433 sqlite3OsEnterMutex();
1434
1435 /* Make sure the current thread owns the pFile.
1436 */
1437 rc = transferOwnership(pFile);
1438 if( rc!=SQLITE_OK ){
1439 sqlite3OsLeaveMutex();
1440 return rc;
1441 }
1442 pLock = pFile->pLock;
1443
1444 /* If some thread using this PID has a lock via a different OsFile*
1445 ** handle that precludes the requested lock, return BUSY.
1446 */
1447 if( (pFile->locktype!=pLock->locktype &&
1448 (pLock->locktype>=PENDING_LOCK || locktype>SHARED_LOCK))
1449 ){
1450 rc = SQLITE_BUSY;
1451 goto end_lock;
1452 }
1453
1454 /* If a SHARED lock is requested, and some thread using this PID already
1455 ** has a SHARED or RESERVED lock, then increment reference counts and
1456 ** return SQLITE_OK.
1457 */
1458 if( locktype==SHARED_LOCK &&
1459 (pLock->locktype==SHARED_LOCK || pLock->locktype==RESERVED_LOCK) ){
1460 assert( locktype==SHARED_LOCK );
1461 assert( pFile->locktype==0 );
1462 assert( pLock->cnt>0 );
1463 pFile->locktype = SHARED_LOCK;
1464 pLock->cnt++;
1465 pFile->pOpen->nLock++;
1466 goto end_lock;
1467 }
1468
1469 lock.l_len = 1L;
1470
1471 lock.l_whence = SEEK_SET;
1472
1473 /* A PENDING lock is needed before acquiring a SHARED lock and before
1474 ** acquiring an EXCLUSIVE lock. For the SHARED lock, the PENDING will
1475 ** be released.
1476 */
1477 if( locktype==SHARED_LOCK
1478 || (locktype==EXCLUSIVE_LOCK && pFile->locktype<PENDING_LOCK)
1479 ){
1480 lock.l_type = (locktype==SHARED_LOCK?F_RDLCK:F_WRLCK);
1481 lock.l_start = PENDING_BYTE;
1482 s = fcntl(pFile->h, F_SETLK, &lock);
1483 if( s==(-1) ){
1484 rc = (errno==EINVAL) ? SQLITE_NOLFS : SQLITE_BUSY;
1485 goto end_lock;
1486 }
1487 }
1488
1489
1490 /* If control gets to this point, then actually go ahead and make
1491 ** operating system calls for the specified lock.
1492 */
1493 if( locktype==SHARED_LOCK ){
1494 assert( pLock->cnt==0 );
1495 assert( pLock->locktype==0 );
1496
1497 /* Now get the read-lock */
1498 lock.l_start = SHARED_FIRST;
1499 lock.l_len = SHARED_SIZE;
1500 s = fcntl(pFile->h, F_SETLK, &lock);
1501
1502 /* Drop the temporary PENDING lock */
1503 lock.l_start = PENDING_BYTE;
1504 lock.l_len = 1L;
1505 lock.l_type = F_UNLCK;
1506 if( fcntl(pFile->h, F_SETLK, &lock)!=0 ){
1507 rc = SQLITE_IOERR_UNLOCK; /* This should never happen */
1508 goto end_lock;
1509 }
1510 if( s==(-1) ){
1511 rc = (errno==EINVAL) ? SQLITE_NOLFS : SQLITE_BUSY;
1512 }else{
1513 pFile->locktype = SHARED_LOCK;
1514 pFile->pOpen->nLock++;
1515 pLock->cnt = 1;
1516 }
1517 }else if( locktype==EXCLUSIVE_LOCK && pLock->cnt>1 ){
1518 /* We are trying for an exclusive lock but another thread in this
1519 ** same process is still holding a shared lock. */
1520 rc = SQLITE_BUSY;
1521 }else{
1522 /* The request was for a RESERVED or EXCLUSIVE lock. It is
1523 ** assumed that there is a SHARED or greater lock on the file
1524 ** already.
1525 */
1526 assert( 0!=pFile->locktype );
1527 lock.l_type = F_WRLCK;
1528 switch( locktype ){
1529 case RESERVED_LOCK:
1530 lock.l_start = RESERVED_BYTE;
1531 break;
1532 case EXCLUSIVE_LOCK:
1533 lock.l_start = SHARED_FIRST;
1534 lock.l_len = SHARED_SIZE;
1535 break;
1536 default:
1537 assert(0);
1538 }
1539 s = fcntl(pFile->h, F_SETLK, &lock);
1540 if( s==(-1) ){
1541 rc = (errno==EINVAL) ? SQLITE_NOLFS : SQLITE_BUSY;
1542 }
1543 }
1544
1545 if( rc==SQLITE_OK ){
1546 pFile->locktype = locktype;
1547 pLock->locktype = locktype;
1548 }else if( locktype==EXCLUSIVE_LOCK ){
1549 pFile->locktype = PENDING_LOCK;
1550 pLock->locktype = PENDING_LOCK;
1551 }
1552
1553 end_lock:
1554 sqlite3OsLeaveMutex();
1555 OSTRACE4("LOCK %d %s %s\n", pFile->h, locktypeName(locktype),
1556 rc==SQLITE_OK ? "ok" : "failed");
1557 return rc;
1558 }
该函数的第二个参数locktype是希望设置的锁类型.Line1419:1423如果文件已经被设置了锁,并且锁类型的严格性满足需要设置的锁类型,则无需任何操作,直接返回SQLITE_OK.Line1431:1433进行线程互斥.需要注意的是,POSIX操作系统底层的锁原语通常是用在进场间,对于进程内的不同线程的互斥操作,需要使用Mutex变量进行保护.Line1435:1441如果文件句柄并不属于当前线程,则需要转移文件句柄所有权到当前线程.Line1444:1452如果其他线程已经通过其他文件描述符持有了该文件锁,且阻止了当前线程要申请设置的锁类型,返回SQLITE_BUSY.Line1454:1467如果其他线程已经通过其他文件描述符持有了该文件锁,并且持有的锁类型是SHARED_LOCK锁或者RESERVED_LOCK锁,当前现场要申请设置的锁类型为SHARED_LOCK,则无需再调用POSIX操作系统的锁原语继续设置,增加相关字段的引用计数即可, 并返回SQLITE_OK.Line1469后面的操作都是需要调用底层锁原语进行设置的情况.Line1471锁操作的字节范围都是基于文件的开始进行计算的.Line1473:1487如果当前线程正在申请设置的是SHARED_LOCK锁或者EXCLUSIVE_LOCK锁, 则需要首先获得PENDING_LOCK锁.Line1490:1516通过系统调用申请设置SHARED_LOCK锁,需要注意的是设置完SHARED_LOCK锁以后,需要释放之前设置的PENDING_LOCK锁.Line1517:1520如果当前线程申请设置的是EXCLUSIVE_LOCK锁,但是其他线程已经持有共享锁,则返回SQLITE_BUSY.Line1522:1543通过系统调用申请设置RESERVED_LOCK锁或者EXCLUSIVE_LOCK锁.Line1545:1547申请锁操作成功,则更新文件句柄锁类型和inode结构的锁类型为当前线程申请的锁类型.Line1548:1550如果申请锁没有成功,并且申请的是EXCLUSIVE_LOCK锁, 则更新文件句柄锁类型和inode结构的锁类型为PENDING_LOCK锁.Line1554释放Mutex互斥变量.
Line1437转移文件句柄所有权的函数transferOwnership,其实现为:
749 static int transferOwnership(unixFile *pFile){
750 int rc;
751 pthread_t hSelf;
752 if( threadsOverrideEachOthersLocks ){
753 /* Ownership transfers not needed on this system */
754 return SQLITE_OK;
755 }
756 hSelf = pthread_self();
757 if( pthread_equal(pFile->tid, hSelf) ){
758 /* We are still in the same thread */
759 OSTRACE1("No-transfer, same thread\n");
760 return SQLITE_OK;
761 }
762 if( pFile->locktype!=NO_LOCK ){
763 /* We cannot change ownership while we are holding a lock! */
764 return SQLITE_MISUSE;
765 }
766 OSTRACE4("Transfer ownership of %d from %d to %d\n",
767 pFile->h, pFile->tid, hSelf);
768 pFile->tid = hSelf;
769 if (pFile->pLock != NULL) {
770 releaseLockInfo(pFile->pLock);
771 rc = findLockInfo(pFile->h, &pFile->pLock, 0);
772 OSTRACE5("LOCK %d is now %s(%s,%d)\n", pFile->h,
773 locktypeName(pFile->locktype),
774 locktypeName(pFile->pLock->locktype), pFile->pLock->cnt);
775 return rc;
776 } else {
777 return SQLITE_OK;
778 }
779 }
Line752:755如果进程内不同线程的锁操作是互相覆盖的行为,则无需进行所有权的转移,该函数直接返回SQLITE_OK.Line756:761文件句柄的归属线程就是当前线程,则无需操作,直接返回SQLITE_OK.Line762:765如果文件句柄已经持有锁,则不能再进行所有权转移.Line768将文件句柄的归属线程设置为当前线程.Line769:778更新indoe中的信息.
threadsOverrideEachOthersLocks是个全局变量,用来标识进程内不同线程的锁操作是否能进行覆盖.相关定义为:
/*
** This variable records whether or not threads can override each others
** locks.
**
** 0: No. Threads cannot override each others locks.
** 1: Yes. Threads can override each others locks.
** -1: We don't know yet.
**
** On some systems, we know at compile-time if threads can override each
** others locks. On those systems, the SQLITE_THREAD_OVERRIDE_LOCK macro
** will be set appropriately. On other systems, we have to check at
** runtime. On these latter systems, SQLTIE_THREAD_OVERRIDE_LOCK is
** undefined.
**
** This variable normally has file scope only. But during testing, we make
** it a global so that the test code can change its value in order to verify
** that the right stuff happens in either case.
*/
#ifndef SQLITE_THREAD_OVERRIDE_LOCK
# define SQLITE_THREAD_OVERRIDE_LOCK -1
#endif
#ifdef SQLITE_TEST
int threadsOverrideEachOthersLocks = SQLITE_THREAD_OVERRIDE_LOCK;
#else
static int threadsOverrideEachOthersLocks = SQLITE_THREAD_OVERRIDE_LOCK;
#endif
如果在编译期,我们就知道在某些操作系统上,进程内的不同线程的锁操作是互相覆盖的,那么可以将宏SQLITE_THREAD_OVERRIDE_LOCK定义为1, 反之,可以定义为0. 如果在编译期, 我们不能明确的知道在该操作系统上,进程内的不同线程的锁操作是否是互相覆盖的,则将宏SQLITE_THREAD_OVERRIDE_LOCK定义为-1或者不定义该宏, 需要在运行期,通过测试程序判断其结果.需要注意的是,如果SQLite始终运行在单线程环境中,相当于多线程间锁操作的互相覆盖的情况.
在运行期判断进程内不同线程的锁操作是否互相覆盖的相关函数是testThreadLockingBehavior, 其实现为:
498 /*
499 ** This procedure attempts to determine whether or not threads
500 ** can override each others locks then sets the
501 ** threadsOverrideEachOthersLocks variable appropriately.
502 */
503 static void testThreadLockingBehavior(int fd_orig){
504 int fd;
505 struct threadTestData d[2];
506 pthread_t t[2];
507
508 fd = dup(fd_orig);
509 if( fd<0 ) return;
510 memset(d, 0, sizeof(d));
511 d[0].fd = fd;
512 d[0].lock.l_type = F_RDLCK;
513 d[0].lock.l_len = 1;
514 d[0].lock.l_start = 0;
515 d[0].lock.l_whence = SEEK_SET;
516 d[1] = d[0];
517 d[1].lock.l_type = F_WRLCK;
518 pthread_create(&t[0], 0, threadLockingTest, &d[0]);
519 pthread_create(&t[1], 0, threadLockingTest, &d[1]);
520 pthread_join(t[0], 0);
521 pthread_join(t[1], 0);
522 close(fd);
523 threadsOverrideEachOthersLocks = d[0].result==0 && d[1].result==0;
524 }
用到的数据结构threadTestData,其定义为:
416 /*
417 ** This structure holds information passed into individual test
418 ** threads by the testThreadLockingBehavior() routine.
419 */
420 struct threadTestData {
421 int fd; /* File to be locked */
422 struct flock lock; /* The locking operation */
423 int result; /* Result of the locking operation */
424 };
线程执行函数threadLockingTest, 其实现为:
485 /*
486 ** The testThreadLockingBehavior() routine launches two separate
487 ** threads on this routine. This routine attempts to lock a file
488 ** descriptor then returns. The success or failure of that attempt
489 ** allows the testThreadLockingBehavior() procedure to determine
490 ** whether or not threads can override each others locks.
491 */
492 static void *threadLockingTest(void *pArg){
493 struct threadTestData *pData = (struct threadTestData*)pArg;
494 pData->result = fcntl(pData->fd, F_SETLK, &pData->lock);
495 return pArg;
496 }
函数testThreadLockingBehavior传入参数为文件句柄,该文件用来测试进程内不同线程的锁操作是否互相覆盖.Line508复制一个文件描述符,测试将在该文件描述符上进行.Line510:517赋值threadTestData数组的2个元素,注意是锁操作相同的范围, 但是一个是读锁,一个是写锁.Line518:521创建2个线程执行,并等待这2个线程执行完毕.Line523从这2个线程的执行结果可以判断,进程内不同线程的锁操作是否相互覆盖.其原因是,因为读锁和写锁是互斥的,如果不同线程的锁操作不能互相覆盖,则第二个线程执行系统调用fcntl时,将返回错误值(非0值),所以Line523判断,如果2个线程的系统调用fcntl都成功了, 则说明进程内不同线程的锁操作是互相覆盖的,并将测试结果赋值给全局变量threadsOverrideEachOthersLocks.
1015

被折叠的 条评论
为什么被折叠?



