(一) mongodb 官网对主从复制锁部分的说明
How does concurrency affect a replica set primary?
In replication, when MongoDB writes to a collection on the primary, MongoDB also writes to the primary’s oplog, which is a special collection in the local database. Therefore, MongoDB must lock both the collection’s database and the local database. The mongod must lock both databases at the same time keep both data consistent and ensure that write operations, even with replication, are “all-or-nothing” operations.
How does concurrency affect secondaries?
In replication, MongoDB does not apply writes serially to secondaries. Secondaries collect oplog entries in batches and then apply those batches in parallel. Secondaries do not allow reads while applying the write operations, and apply write operations in the order that they appear in the oplog.
MongoDB can apply several writes in parallel on replica set secondaries, in two phases:
- During the first prefer phase, under a read lock, the mongod ensures that all documents affected by the operations are in memory. During this phase, other clients may execute queries against this member.
- A thread pool using write locks applies all write operations in the batch as part of a coordinated write phase.
(二) 结合mong 官网的理论理解源码
1. mongdb启动代码流程
-------db.cpp-----
main
--mongoDbMain
----initAndListen
------_initAndListen
---------Listen
------------createServer(options, new MyMessageHandler() );
------------startReplication
上面我们主要关注最后的两个点
(1) MyMessageHandler 这个类是mongdb接收到请求后处理message 的,这个类中的process函数用来处理请求,我们看process函数的调用流程
process
--assembleResponse
if ( op == dbQuery ) { 查询数据
}
else if ( op == dbGetMore ) { 查询
}
else if ( op == dbMsg ) {
}
else {
try {
if ( op == dbKillCursors ) {
}
else if ( !nsString.isValid() ) {
}
else if ( op == dbInsert ) {
receivedInsert(m, currentOp); 插入数据
}
else if ( op == dbUpdate ) {
receivedUpdate(m, currentOp); 更新数据
}
else if ( op == dbDelete ) {
receivedDelete(m, currentOp); 删除数据
}
else {
}
我们这里只分析 写数据的主体部分
receivedInsert(m, currentOp);
while ( true ) {
try {
Lock::DBWrite lk(ns); 申请了数据库的写锁
。。。。。。。。。。。。。。。。。。。
if (multi.size() > 1) {
const bool keepGoing = d.reservedField() & InsertOption_ContinueOnError;
insertMulti(keepGoing, ns, multi, op); 写数据和oplog
} else {
checkAndInsert(ns, multi[0]); 写数据和oplog
globalOpCounters.incInsertInWriteLock(1);
op.debug().ninserted = 1;
}
return;
}
上面代码的粗体部分是重点, insertMulti函数主要调用的仍然是checkAndInsert,
void checkAndInsert(const char *ns, /*modifies*/BSONObj& js) {
{
while ( i.more() ) {
theDataFileMgr.insertWithObjMod() 数据插入数据库
logOp("i", ns, js); 写oplog
}
}
logOp函数的作用是写 oplog(local库的oplog表), 此函数的主体部分如下
if ( replSettings.master ) { 判断主从模式的 主节点
_logOp(opstr, ns, 0, obj, patt, b, fromMigrate); 主体如下
}
_logOpOld
----Lock::DBWrite lk("local"); 锁local 库
----写oplog 写oplog表
2)总结主节点的写操作
1. 写锁 要写的库
2. 写数据
3. 写锁local库
4. 写oplog
(3) startReplication()函数的主要流程及作用
函数主体代码如下
if ( replSettings.slave ) {
boost::thread repl_thread(replSlaveThread) 从节点,启动一个replSlaveThread 线程
replSlaveThread);
}
if ( replSettings.master ) {
replSettings.master = true;
createOplog(); 主节点创建oplog表相关
boost::thread t(replMasterThread); 开启一个replMasterThread线程
}
1. 先分析主节点replMasterThread线程的作用
static void replMasterThread() {
int toSleep = 10;
while( 1 ) {
sleepsecs( toSleep ); 睡眠10秒
{
writelocktry lk(1);
logKeepalive(); 重点 ,下面分析
}
}
logKeepalive 函数主体如下
Void logKeepalive() {
_logOp("n", "", 0, BSONObj(), 0, 0, false); 见前面分析主节点写数据的部分,锁local库
}
现在就清楚了,这个线程就是往oplog里每10秒写一条数据 ,如下
{ "ts" : Timestamp(1373347524, 1), "op" : "n", "ns" : "", "o" : { } }
2. slave 部分的 replSlaveThread 线程分析
void replSlaveThread() {
sleepsecs(1);
Client::initThread("replslave");
while ( 1 ) {
try {
replMain(); slave线程在循环中调用此函数,下面分析函数作用
sleepsecs(5);
}
}
}
replMain 函数 主体部分如下,调用_replMain
while ( 1 ) {
s = _replMain(sources, nApplied);
}
_replMain 函数主体部分如下
_replMain
{
Lock::GlobalWrite lk; 获取全局锁
ReplSource::loadAll(sources); 一个slaver节点可以配置多个主节点
} 释放所
for ( ReplSource::SourceVector::iterator i = sources.begin(); i != sources.end(); i++ ) {
res = s->sync(nApplied);
sync
----sync_pullOpLog
---------从主节点的oplog获取数据
---------Lock::GlobalWrite> lk( justOne ? 0 : new Lock::GlobalWrite() ); 获取全局写锁
--------sync_pullOpLog_applyOperation(BSONObj& op, bool alreadyLocked) 把oplog 的数据写入自己的库中
3. 总结 slave 部分
开一个线程循环从master节点读取oplog信息, 获取 全局写锁, 写数据库
和官方文档描述的一致,获取主节点信息时从节点依然可以服务,当把oplog应用于自身数据库时获取了全局写锁,无法服务于客户端。