SQLite的锁实现机制

SQLite为了实现事务的原子提交, 构造了4种类型的锁.这里说的锁指的是SQLite层面的锁,因为SQLite的这些锁需要底层操作系统的锁原语的支持,不同操作系统或者同一类操作系统的不同版本之间,锁原语的实现可能存在差别.在用户进程的角度,SQLite锁需要尽量屏蔽底层不同操作系统锁原语的差异.本文主要讨论SQLite锁机制以及底层POSIX操作系统的锁原语,不涉及Windows操作系统的锁原语的分析.

在SQLite中,底层POSIX操作系统的锁原语结构为(定义在fcntl.h头文件中):

struct flock
  {
    short int l_type;	/* Type of lock: F_RDLCK, F_WRLCK, or F_UNLCK.	*/
    short int l_whence;	/* Where `l_start' is relative to (like `lseek').  */
#ifndef __USE_FILE_OFFSET64
    __off_t l_start;	/* Offset where the lock begins.  */
    __off_t l_len;	/* Size of the locked area; zero means until EOF.  */
#else
    __off64_t l_start;	/* Offset where the lock begins.  */
    __off64_t l_len;	/* Size of the locked area; zero means until EOF.  */
#endif
    __pid_t l_pid;	/* Process holding the lock.  */
  };

对于底层POSIX操作系统提供的锁,需要注意的是:

  • 这是建议性的锁,不是强制锁.需要调用者用一致的方式使用建议性锁.
  • 结构flock中的字段l_type为锁的类型,在上锁状态下,分为读锁和写锁.多个持有读锁的进程可以并发进行读操作,其他进程也可以获得新的读锁.在给定时间,只有一个进程持有写锁,具有排他性.
  • 锁结构flock中的字段l_whence,l_start和l_len给定了锁操作的范围.这个范围不一定存在于文件长度范围内.

可以使用fcntl系统调用查询或者设置锁状态,fcntl系统调用操作锁的命令有:

#  define F_GETLK	5	/* Get record locking info.  */
#  define F_SETLK	6	/* Set record locking info (non-blocking).  */
#  define F_SETLKW	7	/* Set record locking info (blocking).	*/

SQLite基于底层的锁原语设计了4种锁,分别是:

  • SHARED_LOCK 这是读操作锁, 多个进程可以同时持有该类型的锁,实现并发读操作.
  • RESERVED_LOCK 这是写操作锁,在给定时间内,只有一个进程可以持有该类型的锁.但是已经持有SHARED_LOCK读锁的其他进程,仍然可以进行读操作,其他进程也可以获得新的SHARED_LOCK读锁.
  • PENDING_LOCK 这是写操作锁,其他持有SHARED_LOCK锁的进程可以继续进行读操作,但是该锁将阻止其他进程获取新的SHARED_LOCK锁.
  • EXCLUSIVE_LOCK 这是写操作锁,在给定时间内,只有一个进程持有EXCLUSIVE_LOCK锁,这将阻止其他进程获取任何类型的锁.也就是说,获得该锁的前提是,其他进程没有持有任何锁.

对于SQLite层面的这4种类型的的锁,有2点需要说明的是:

  • 这4种锁依照严格性是依次递增的, 锁之间的状态迁移是有一定顺序的,也是按照严格性进行依次状态迁移的.存在如下几种状态迁移的路径:
UNLOCKED -> SHARED
SHARED -> RESERVED
SHARED -> (PENDING) -> EXCLUSIVE
RESERVED -> (PENDING) -> EXCLUSIVE
PENDING -> EXCLUSIVE
  • PENDING_LOCK类型的锁,只是一种中间状态的锁.SQLite的接口函数sqlite3OsLock不会主动迁移到该锁状态.PENDING_LOCK状态的锁是由于在主动迁移到EXCLUSIVE状态时,而不能立即获得EXCLUSIVE锁时,暂时迁移到该状态.

为了在SQLite层面实现这4种类型的锁,SQLite定义了锁操作的范围(os.h头文件):

    337 #ifndef SQLITE_TEST
    338 #define PENDING_BYTE      0x40000000  /* First byte past the 1GB boundary */
    339 #else
    340 extern unsigned int sqlite3_pending_byte;
    341 #define PENDING_BYTE sqlite3_pending_byte
    342 #endif
    343 
    344 #define RESERVED_BYTE     (PENDING_BYTE+1)
    345 #define SHARED_FIRST      (PENDING_BYTE+2)
    346 #define SHARED_SIZE       510

对于这个锁范围的定义,需要注意的是:

  • RESERVED_LOCK锁操作的是偏移RESERVED_BYTE的一个字节.PENDING_LOCK锁操作的是偏移PENDING_BYTE的一个字节.SHARED_LOCK锁和EXCLUSIVE_LOCK锁都是操作的是偏移SHARED_FIRST的510个字节.
  • 在POSIX操作系统上,SHARED_LOCK锁和EXCLUSIVE_LOCK锁也是可以操作一个字节的,但是在Windows操作系统老的版本中,操作相同字节的读锁会发生互斥,因此为了兼容老的Windows版本的锁原语, 将读锁的范围定义的足够大,保证了读操作的并发性.
  • 锁操作的范围使用了512个字节(1字节的PENDING_BYTE锁,1字节的RESERVED_BYTE锁,510字节的SHARED_LOCK锁和EXCLUSIVE_LOCK锁),锁操作的范围在一个页面大小中.锁操作的起始地址是PENDING_BYTE, 如果文件足够大,并且因为锁操作的范围不存储实际的数据,因此文件中将存在一段空洞.

SQLite在操作回滚日志文件和数据库文件的逻辑流程中,都需要使用到锁.SQLite 源码中使用POSIX操作系统锁原语进行锁状态迁移的函数为unixLock:

   1365 static int unixLock(OsFile *id, int locktype){
   1366   /* The following describes the implementation of the various locks and
   1367   ** lock transitions in terms of the POSIX advisory shared and exclusive
   1368   ** lock primitives (called read-locks and write-locks below, to avoid
   1369   ** confusion with SQLite lock names). The algorithms are complicated
   1370   ** slightly in order to be compatible with windows systems simultaneously
   1371   ** accessing the same database file, in case that is ever required.
   1372   **
   1373   ** Symbols defined in os.h indentify the 'pending byte' and the 'reserved
   1374   ** byte', each single bytes at well known offsets, and the 'shared byte
   1375   ** range', a range of 510 bytes at a well known offset.
   1376   **
   1377   ** To obtain a SHARED lock, a read-lock is obtained on the 'pending
   1378   ** byte'.  If this is successful, a random byte from the 'shared byte
   1379   ** range' is read-locked and the lock on the 'pending byte' released.
   1380   **
   1381   ** A process may only obtain a RESERVED lock after it has a SHARED lock.
   1382   ** A RESERVED lock is implemented by grabbing a write-lock on the
   1383   ** 'reserved byte'.
   1384   **
   1385   ** A process may only obtain a PENDING lock after it has obtained a
   1386   ** SHARED lock. A PENDING lock is implemented by obtaining a write-lock
   1387   ** on the 'pending byte'. This ensures that no new SHARED locks can be
   1388   ** obtained, but existing SHARED locks are allowed to persist. A process
   1389   ** does not have to obtain a RESERVED lock on the way to a PENDING lock.
   1390   ** This property is used by the algorithm for rolling back a journal file
   1391   ** after a crash.
   1392   **
   1393   ** An EXCLUSIVE lock, obtained after a PENDING lock is held, is
   1394   ** implemented by obtaining a write-lock on the entire 'shared byte
   1395   ** range'. Since all other locks require a read-lock on one of the bytes
   1396   ** within this range, this ensures that no other locks are held on the
   1397   ** database.
   1398   **
   1399   ** The reason a single byte cannot be used instead of the 'shared byte
   1400   ** range' is that some versions of windows do not support read-locks. By
   1401   ** locking a random byte from a range, concurrent SHARED locks may exist
   1402   ** even if the locking primitive used is always a write-lock.
   1403   */
   1404   int rc = SQLITE_OK;
   1405   unixFile *pFile = (unixFile*)id;
   1406   struct lockInfo *pLock = pFile->pLock;
   1407   struct flock lock;
   1408   int s;
   1409 
   1410   assert( pFile );
   1411   OSTRACE7("LOCK    %d %s was %s(%s,%d) pid=%d\n", pFile->h,
   1412       locktypeName(locktype), locktypeName(pFile->locktype),
   1413       locktypeName(pLock->locktype), pLock->cnt , getpid());
   1414 
   1415   /* If there is already a lock of this type or more restrictive on the
   1416   ** OsFile, do nothing. Don't use the end_lock: exit path, as
   1417   ** sqlite3OsEnterMutex() hasn't been called yet.
   1418   */
   1419   if( pFile->locktype>=locktype ){
   1420     OSTRACE3("LOCK    %d %s ok (already held)\n", pFile->h,
   1421             locktypeName(locktype));
   1422     return SQLITE_OK;
   1423   }
   1424 
   1425   /* Make sure the locking sequence is correct
   1426   */
   1427   assert( pFile->locktype!=NO_LOCK || locktype==SHARED_LOCK );
   1428   assert( locktype!=PENDING_LOCK );
   1429   assert( locktype!=RESERVED_LOCK || pFile->locktype==SHARED_LOCK );
   1430 
   1431   /* This mutex is needed because pFile->pLock is shared across threads
   1432   */
   1433   sqlite3OsEnterMutex();
   1434 
   1435   /* Make sure the current thread owns the pFile.
   1436   */
   1437   rc = transferOwnership(pFile);
   1438   if( rc!=SQLITE_OK ){
   1439     sqlite3OsLeaveMutex();
   1440     return rc;
   1441   }
   1442   pLock = pFile->pLock;
   1443 
   1444   /* If some thread using this PID has a lock via a different OsFile*
   1445   ** handle that precludes the requested lock, return BUSY.
   1446   */
   1447   if( (pFile->locktype!=pLock->locktype &&
   1448           (pLock->locktype>=PENDING_LOCK || locktype>SHARED_LOCK))
   1449   ){
   1450     rc = SQLITE_BUSY;
   1451     goto end_lock;
   1452   }
   1453 
   1454   /* If a SHARED lock is requested, and some thread using this PID already
   1455   ** has a SHARED or RESERVED lock, then increment reference counts and
   1456   ** return SQLITE_OK.
   1457   */
   1458   if( locktype==SHARED_LOCK &&
   1459       (pLock->locktype==SHARED_LOCK || pLock->locktype==RESERVED_LOCK) ){
   1460     assert( locktype==SHARED_LOCK );
   1461     assert( pFile->locktype==0 );
   1462     assert( pLock->cnt>0 );
   1463     pFile->locktype = SHARED_LOCK;
   1464     pLock->cnt++;
   1465     pFile->pOpen->nLock++;
   1466     goto end_lock;
   1467   }
   1468 
   1469   lock.l_len = 1L;
   1470 
   1471   lock.l_whence = SEEK_SET;
   1472 
   1473   /* A PENDING lock is needed before acquiring a SHARED lock and before
   1474   ** acquiring an EXCLUSIVE lock.  For the SHARED lock, the PENDING will
   1475   ** be released.
   1476   */
   1477   if( locktype==SHARED_LOCK
   1478       || (locktype==EXCLUSIVE_LOCK && pFile->locktype<PENDING_LOCK)
   1479   ){
   1480     lock.l_type = (locktype==SHARED_LOCK?F_RDLCK:F_WRLCK);
   1481     lock.l_start = PENDING_BYTE;
   1482     s = fcntl(pFile->h, F_SETLK, &lock);
   1483     if( s==(-1) ){
   1484       rc = (errno==EINVAL) ? SQLITE_NOLFS : SQLITE_BUSY;
   1485       goto end_lock;
   1486     }
   1487   }
   1488 
   1489 
   1490   /* If control gets to this point, then actually go ahead and make
   1491   ** operating system calls for the specified lock.
   1492   */
   1493   if( locktype==SHARED_LOCK ){
   1494     assert( pLock->cnt==0 );
   1495     assert( pLock->locktype==0 );
   1496 
   1497     /* Now get the read-lock */
   1498     lock.l_start = SHARED_FIRST;
   1499     lock.l_len = SHARED_SIZE;
   1500     s = fcntl(pFile->h, F_SETLK, &lock);
   1501 
   1502     /* Drop the temporary PENDING lock */
   1503     lock.l_start = PENDING_BYTE;
   1504     lock.l_len = 1L;
   1505     lock.l_type = F_UNLCK;
   1506     if( fcntl(pFile->h, F_SETLK, &lock)!=0 ){
   1507       rc = SQLITE_IOERR_UNLOCK;  /* This should never happen */
   1508       goto end_lock;
   1509     }
   1510     if( s==(-1) ){
   1511       rc = (errno==EINVAL) ? SQLITE_NOLFS : SQLITE_BUSY;
   1512     }else{
   1513       pFile->locktype = SHARED_LOCK;
   1514       pFile->pOpen->nLock++;
   1515       pLock->cnt = 1;
   1516     }
   1517   }else if( locktype==EXCLUSIVE_LOCK && pLock->cnt>1 ){
   1518     /* We are trying for an exclusive lock but another thread in this
   1519     ** same process is still holding a shared lock. */
   1520     rc = SQLITE_BUSY;
   1521   }else{
   1522     /* The request was for a RESERVED or EXCLUSIVE lock.  It is
   1523     ** assumed that there is a SHARED or greater lock on the file
   1524     ** already.
   1525     */
   1526     assert( 0!=pFile->locktype );
   1527     lock.l_type = F_WRLCK;
   1528     switch( locktype ){
   1529       case RESERVED_LOCK:
   1530         lock.l_start = RESERVED_BYTE;
   1531         break;
   1532       case EXCLUSIVE_LOCK:
   1533         lock.l_start = SHARED_FIRST;
   1534         lock.l_len = SHARED_SIZE;
   1535         break;
   1536       default:
   1537         assert(0);
   1538     }
   1539     s = fcntl(pFile->h, F_SETLK, &lock);
   1540     if( s==(-1) ){
   1541       rc = (errno==EINVAL) ? SQLITE_NOLFS : SQLITE_BUSY;
   1542     }
   1543   }
   1544 
   1545   if( rc==SQLITE_OK ){
   1546     pFile->locktype = locktype;
   1547     pLock->locktype = locktype;
   1548   }else if( locktype==EXCLUSIVE_LOCK ){
   1549     pFile->locktype = PENDING_LOCK;
   1550     pLock->locktype = PENDING_LOCK;
   1551   }
   1552 
   1553 end_lock:
   1554   sqlite3OsLeaveMutex();
   1555   OSTRACE4("LOCK    %d %s %s\n", pFile->h, locktypeName(locktype),
   1556       rc==SQLITE_OK ? "ok" : "failed");
   1557   return rc;
   1558 }

该函数的第二个参数locktype是希望设置的锁类型.Line1419:1423如果文件已经被设置了锁,并且锁类型的严格性满足需要设置的锁类型,则无需任何操作,直接返回SQLITE_OK.Line1431:1433进行线程互斥.需要注意的是,POSIX操作系统底层的锁原语通常是用在进场间,对于进程内的不同线程的互斥操作,需要使用Mutex变量进行保护.Line1435:1441如果文件句柄并不属于当前线程,则需要转移文件句柄所有权到当前线程.Line1444:1452如果其他线程已经通过其他文件描述符持有了该文件锁,且阻止了当前线程要申请设置的锁类型,返回SQLITE_BUSY.Line1454:1467如果其他线程已经通过其他文件描述符持有了该文件锁,并且持有的锁类型是SHARED_LOCK锁或者RESERVED_LOCK锁,当前现场要申请设置的锁类型为SHARED_LOCK,则无需再调用POSIX操作系统的锁原语继续设置,增加相关字段的引用计数即可, 并返回SQLITE_OK.Line1469后面的操作都是需要调用底层锁原语进行设置的情况.Line1471锁操作的字节范围都是基于文件的开始进行计算的.Line1473:1487如果当前线程正在申请设置的是SHARED_LOCK锁或者EXCLUSIVE_LOCK锁, 则需要首先获得PENDING_LOCK锁.Line1490:1516通过系统调用申请设置SHARED_LOCK锁,需要注意的是设置完SHARED_LOCK锁以后,需要释放之前设置的PENDING_LOCK锁.Line1517:1520如果当前线程申请设置的是EXCLUSIVE_LOCK锁,但是其他线程已经持有共享锁,则返回SQLITE_BUSY.Line1522:1543通过系统调用申请设置RESERVED_LOCK锁或者EXCLUSIVE_LOCK锁.Line1545:1547申请锁操作成功,则更新文件句柄锁类型和inode结构的锁类型为当前线程申请的锁类型.Line1548:1550如果申请锁没有成功,并且申请的是EXCLUSIVE_LOCK锁, 则更新文件句柄锁类型和inode结构的锁类型为PENDING_LOCK锁.Line1554释放Mutex互斥变量.

Line1437转移文件句柄所有权的函数transferOwnership,其实现为:

    749 static int transferOwnership(unixFile *pFile){
    750   int rc;
    751   pthread_t hSelf;
    752   if( threadsOverrideEachOthersLocks ){
    753     /* Ownership transfers not needed on this system */
    754     return SQLITE_OK;
    755   }
    756   hSelf = pthread_self();
    757   if( pthread_equal(pFile->tid, hSelf) ){
    758     /* We are still in the same thread */
    759     OSTRACE1("No-transfer, same thread\n");
    760     return SQLITE_OK;
    761   }
    762   if( pFile->locktype!=NO_LOCK ){
    763     /* We cannot change ownership while we are holding a lock! */
    764     return SQLITE_MISUSE;
    765   }
    766   OSTRACE4("Transfer ownership of %d from %d to %d\n",
    767             pFile->h, pFile->tid, hSelf);
    768   pFile->tid = hSelf;
    769   if (pFile->pLock != NULL) {
    770     releaseLockInfo(pFile->pLock);
    771     rc = findLockInfo(pFile->h, &pFile->pLock, 0);
    772     OSTRACE5("LOCK    %d is now %s(%s,%d)\n", pFile->h,
    773            locktypeName(pFile->locktype),
    774            locktypeName(pFile->pLock->locktype), pFile->pLock->cnt);
    775     return rc;
    776   } else {
    777     return SQLITE_OK;
    778   }
    779 }

Line752:755如果进程内不同线程的锁操作是互相覆盖的行为,则无需进行所有权的转移,该函数直接返回SQLITE_OK.Line756:761文件句柄的归属线程就是当前线程,则无需操作,直接返回SQLITE_OK.Line762:765如果文件句柄已经持有锁,则不能再进行所有权转移.Line768将文件句柄的归属线程设置为当前线程.Line769:778更新indoe中的信息.

threadsOverrideEachOthersLocks是个全局变量,用来标识进程内不同线程的锁操作是否能进行覆盖.相关定义为:

/*
** This variable records whether or not threads can override each others
** locks.
**
**    0:  No.  Threads cannot override each others locks.
**    1:  Yes.  Threads can override each others locks.
**   -1:  We don't know yet.
**
** On some systems, we know at compile-time if threads can override each
** others locks.  On those systems, the SQLITE_THREAD_OVERRIDE_LOCK macro
** will be set appropriately.  On other systems, we have to check at
** runtime.  On these latter systems, SQLTIE_THREAD_OVERRIDE_LOCK is
** undefined.
**
** This variable normally has file scope only.  But during testing, we make
** it a global so that the test code can change its value in order to verify
** that the right stuff happens in either case.
*/
#ifndef SQLITE_THREAD_OVERRIDE_LOCK
# define SQLITE_THREAD_OVERRIDE_LOCK -1
#endif
#ifdef SQLITE_TEST
int threadsOverrideEachOthersLocks = SQLITE_THREAD_OVERRIDE_LOCK;
#else
static int threadsOverrideEachOthersLocks = SQLITE_THREAD_OVERRIDE_LOCK;
#endif

如果在编译期,我们就知道在某些操作系统上,进程内的不同线程的锁操作是互相覆盖的,那么可以将宏SQLITE_THREAD_OVERRIDE_LOCK定义为1, 反之,可以定义为0. 如果在编译期, 我们不能明确的知道在该操作系统上,进程内的不同线程的锁操作是否是互相覆盖的,则将宏SQLITE_THREAD_OVERRIDE_LOCK定义为-1或者不定义该宏, 需要在运行期,通过测试程序判断其结果.需要注意的是,如果SQLite始终运行在单线程环境中,相当于多线程间锁操作的互相覆盖的情况.

在运行期判断进程内不同线程的锁操作是否互相覆盖的相关函数是testThreadLockingBehavior, 其实现为:

    498 /*
    499 ** This procedure attempts to determine whether or not threads
    500 ** can override each others locks then sets the
    501 ** threadsOverrideEachOthersLocks variable appropriately.
    502 */
    503 static void testThreadLockingBehavior(int fd_orig){
    504   int fd;
    505   struct threadTestData d[2];
    506   pthread_t t[2];
    507 
    508   fd = dup(fd_orig);
    509   if( fd<0 ) return;
    510   memset(d, 0, sizeof(d));
    511   d[0].fd = fd;
    512   d[0].lock.l_type = F_RDLCK;
    513   d[0].lock.l_len = 1;
    514   d[0].lock.l_start = 0;
    515   d[0].lock.l_whence = SEEK_SET;
    516   d[1] = d[0];
    517   d[1].lock.l_type = F_WRLCK;
    518   pthread_create(&t[0], 0, threadLockingTest, &d[0]);
    519   pthread_create(&t[1], 0, threadLockingTest, &d[1]);
    520   pthread_join(t[0], 0);
    521   pthread_join(t[1], 0);
    522   close(fd);
    523   threadsOverrideEachOthersLocks =  d[0].result==0 && d[1].result==0;
    524 }

用到的数据结构threadTestData,其定义为:

    416 /*
    417 ** This structure holds information passed into individual test
    418 ** threads by the testThreadLockingBehavior() routine.
    419 */
    420 struct threadTestData {
    421   int fd;                /* File to be locked */
    422   struct flock lock;     /* The locking operation */
    423   int result;            /* Result of the locking operation */
    424 };

线程执行函数threadLockingTest, 其实现为:

    485 /*
    486 ** The testThreadLockingBehavior() routine launches two separate
    487 ** threads on this routine.  This routine attempts to lock a file
    488 ** descriptor then returns.  The success or failure of that attempt
    489 ** allows the testThreadLockingBehavior() procedure to determine
    490 ** whether or not threads can override each others locks.
    491 */
    492 static void *threadLockingTest(void *pArg){
    493   struct threadTestData *pData = (struct threadTestData*)pArg;
    494   pData->result = fcntl(pData->fd, F_SETLK, &pData->lock);
    495   return pArg;
    496 }

函数testThreadLockingBehavior传入参数为文件句柄,该文件用来测试进程内不同线程的锁操作是否互相覆盖.Line508复制一个文件描述符,测试将在该文件描述符上进行.Line510:517赋值threadTestData数组的2个元素,注意是锁操作相同的范围, 但是一个是读锁,一个是写锁.Line518:521创建2个线程执行,并等待这2个线程执行完毕.Line523从这2个线程的执行结果可以判断,进程内不同线程的锁操作是否相互覆盖.其原因是,因为读锁和写锁是互斥的,如果不同线程的锁操作不能互相覆盖,则第二个线程执行系统调用fcntl时,将返回错误值(非0值),所以Line523判断,如果2个线程的系统调用fcntl都成功了, 则说明进程内不同线程的锁操作是互相覆盖的,并将测试结果赋值给全局变量threadsOverrideEachOthersLocks.

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值