转自:
http://www.itnose.net/detail/6193518.html
说明:该篇内容部分来自红丸编写的MongoDB实战文章。
1、简介
MongoDB支持在多个机器中通过异步复制达到故障转移和实现冗余,多机器中同一时刻只有一台是用于写操作,正是由于这个情况,为了MongoDB提供了数据一致性的保障,担当primary角色的服务能把读操作分发给Slave(详情请看前两篇关于Replica Set成员组成和理解)。
MongoDB高可用分为两种:
- Master-Slave主从复制:只需要在某一个服务启动时加上-master参数,而另外一个服务加上-slave与-source参数,即可实现同步,MongoDB的最新版本已经不在推荐此方案。在官网的文档中有如下一段提醒:
IMPORTANT
Replica sets replace master-slave replication for most use cases. If possible, use replica sets rather than master-slave replication for all new production deployments. This documentation remains to support legacy deployments and for archival purposes only.
意思就是说在很多的案例中已经用Replica Set来替代Master-slave。
- Replica Set复制集:MongoDB在1.6版本后开发了新功能Replica Set,这比之前的Replication功能要强大一些,增加了故障自动切换和自动修复成员节点,各个DB之间数据完全一致,大大降低了维护难度,auto shard已经明确说明不支持replication paris,建议使用Replica Set,故障完全自动切换。
2、实践架构MongoDB的Replica Set的架构非常类似一个集群,是的,你完全可以把它当做集群,因为它却是跟集群实现的作用是一样的,其中一个节点故障,其他的节点马上会将业务接过来而无须停机操作,在此实践中就选择MongoDB最常用的3个成员架构。
3、部署Replica Set接下来将一步一步的给来实施该架构的部署 - 环境准备
系统环境:CentOS 6.4 64 bit(一台虚拟机)MongoDB版本:MongoDB 2.6版本
- 步骤
创建数据存储目录:
[root@localhost mongodb]# mkdir -p r0 [root@localhost mongodb]# mkdir -p r1 [root@localhost mongodb]# mkdir -p r2
创建日志文件路径:[root@localhost mongodb]# mkdir -p log
创建主从key文件,用于标识集群的私钥的完整路径,如果各个实例的keyfile内容不一致,程序将不能正常启动。
[root@localhost mongodb]# mkdir -p key
[root@localhost mongodb]# echo "this is rs1 super secret key">key/r0 [root@localhost mongodb]# echo "this is rs1 super secret key">key/r1 [root@localhost mongodb]# echo "this is rs1 super secret key">key/r2 [root@localhost mongodb]# chmod 600 key/r* [root@localhost mongodb]#
启动三个实例:[root@localhost bin]# ./mongod --replSet rs1 --keyFile=/usr/local/mongodb/key/r0 --fork --port 28010 --dbpath=/usr/local/mongodb/r0 --logpath=/usr/local/mongodb/log/r0.log --logappend about to fork child process, waiting until server is ready for connections. forked process: 2545
[root@localhost bin]# ./mongod --replSet rs1 --keyFile=/usr/local/mongodb/key/r1 --fork --port 28011 --dbpath=/usr/local/mongodb/r1 --logpath=/usr/local/mongodb/log/r1.log --logappend about to fork child process, waiting until server is ready for connections. forked process: 2596
[root@localhost bin]# ./mongod --replSet rs1 --keyFile=/usr/local/mongodb/key/r2 --fork --port 28012 --dbpath=/usr/local/mongodb/r2 --logpath=/usr/local/mongodb/log/r2.log --logappend about to fork child process, waiting until server is ready for connections. forked process: 2602
说明:三个实例端口分别为28010、28011、28012 数据存放文件分别为r0、r1、r2。配置以及初始化Replica Sets[root@localhost bin]# ./mongo --port 28010 MongoDB shell version: 2.6.6 connecting to: 127.0.0.1:28010/test > config_rs1={_id:"rs1",members:[{_id:0,host:'localhost:28010',priority:1},{_id:1,host:'localhost:28011'},{_id:2,host:'localhost:28012'}]} { "_id" : "rs1", "members" : [ { "_id" : 0, "host" : "localhost:28010", "priority" : 1 }, { "_id" : 1, "host" : "localhost:28011" }, { "_id" : 2, "host" : "localhost:28012" } ] } >
说明:指定每个阶段的IP和端口,priority=1作用将端口28010设置为primary。
> rs.initiate(config_rs1); { "info" : "Config now saved locally. Should come online in about a minute.", "ok" : 1 } >
查看复制集的状态:rs1:OTHER> rs.status(); { "set" : "rs1", "date" : ISODate("2015-01-16T03:10:41Z"), "myState" : 2, "members" : [ { "_id" : 0, "name" : "localhost:28010", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 260, "optime" : Timestamp(1421377833, 1), "optimeDate" : ISODate("2015-01-16T03:10:33Z"), "self" : true }, { "_id" : 1, "name" : "localhost:28011", "health" : 1, <span style="background-color: rgb(255, 0, 0);"><span style="color:#ff0000;"> </span>"state" : 5, "stateStr" : "STARTUP2",</span> "uptime" : 8, "optime" : Timestamp(0, 0), "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2015-01-16T03:10:39Z"), "lastHeartbeatRecv" : ISODate("2015-01-16T03:10:39Z"), "pingMs" : 0, "<span style="background-color: rgb(255, 0, 0);">lastHeartbeatMessage" : "initial sync need a member to be primary or secondary to do our initial sync"</span> }, { "_id" : 2, "name" : "localhost:28012", "health" : 1, "state" : 5, "stateStr" : "STARTUP2", "uptime" : 8, "optime" : Timestamp(0, 0), "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2015-01-16T03:10:39Z"), "lastHeartbeatRecv" : ISODate("2015-01-16T03:10:40Z"), "pingMs" : 0, <span style="background-color: rgb(255, 0, 0);">"lastHeartbeatMessage" : "initial sync need a member to be primary or secondary to do our initial sync"</span> } ], "ok" : 1 }
说明:在name为localhost:28010的节点的stateStr为SECONDARY,这是为什么呢?我们结合下面红色字体标注的地方来看,在调用rs.initiatie初始化Replica Set配置时,里面的提示信息为:Should come online in about a minute,也就是这个过程需要花费大约一分钟,执行该方法后,命令行却已经结束,此时立马使用rs.status方法查询时,就出现上述代码的问题,此时系统还在初始化,primary和Secondary还并没有完全定义完,此时的三个节点状态为2或者5.state=5的通过lastHeartbeatMessage可以查出正在同步中。过一会再次执行rs.status()方法查看状态:rs1:PRIMARY> rs.status(); { "set" : "rs1", "date" : ISODate("2015-01-16T03:13:09Z"), "myState" : 1, "members" : [ { "_id" : 0, "name" : "localhost:28010", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 408, "optime" : Timestamp(1421377833, 1), "optimeDate" : ISODate("2015-01-16T03:10:33Z"), "electionTime" : Timestamp(1421377841, 1), "electionDate" : ISODate("2015-01-16T03:10:41Z"), "self" : true }, { "_id" : 1, "name" : "localhost:28011", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 156, "optime" : Timestamp(1421377833, 1), "optimeDate" : ISODate("2015-01-16T03:10:33Z"), "lastHeartbeat" : ISODate("2015-01-16T03:13:09Z"), "lastHeartbeatRecv" : ISODate("2015-01-16T03:13:07Z"), "pingMs" : 1, "syncingTo" : "localhost:28010" }, { "_id" : 2, "name" : "localhost:28012", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 156, "optime" : Timestamp(1421377833, 1), "optimeDate" : ISODate("2015-01-16T03:10:33Z"), "lastHeartbeat" : ISODate("2015-01-16T03:13:08Z"), "lastHeartbeatRecv" : ISODate("2015-01-16T03:13:09Z"), "pingMs" : 0, "syncingTo" : "localhost:28010" } ], "ok" : 1 }
此时Replica Set已经初始化完成,各个节点状态均以正常,state=1的为primary服务。state=2的为SECONDARY服务节点。两个SECONDARY状态的阶段都是通过28010端口同步数据,通过syncingTo字段可以看出。参数说明:_id:唯一键name:主机名称端口health:健康状态,1为健康state:服务状态 1为PRIMARY 2为SECONDARY,还有其他在后续会讲解,如5为同步中stateStr:状态描述optime:操作时间optimeDate:操作日期electionTime:选举成员时间electionDate:选举成员日期lastHearbeat:最后心跳时间lastHearbeatRecv:最后心跳接收时间pingMs:mongod服务状态syncingTo:同步数据源头
还可以用isMaster查看Replica Sets状态。rs1:PRIMARY> rs.isMaster(); { "setName" : "rs1", "setVersion" : 1, "ismaster" : true, "secondary" : false, "hosts" : [ "localhost:28010", "localhost:28012", "localhost:28011" ], "primary" : "localhost:28010", "me" : "localhost:28010", "maxBsonObjectSize" : 16777216, "maxMessageSizeBytes" : 48000000, "maxWriteBatchSize" : 1000, "localTime" : ISODate("2015-01-16T02:38:58.479Z"), "maxWireVersion" : 2, "minWireVersion" : 0, "ok" : 1 } rs1:PRIMARY>
3.1、主从操作日志oplog
MongoDB的Replica Set架构是通过一个日志来存储写操作的,这个日志叫做oplog,在前面的教程中已经学习过了,oplog.rs是一个固定长度的capped collection,它存在于local数据库中,用于记录Replica Sets的操作日志,在默认情况下,对于64位的MongoDB,oplog是比较大的,可以达到5%的磁盘空间,oplog的大小可以通过mongod的参数--oplogSize来改变oplog的日志大小。rs1:PRIMARY> use local switched to db local rs1:PRIMARY> show collections me oplog.rs startup_log system.indexes system.replset rs1:PRIMARY> \
rs1:PRIMARY> db.oplog.rs.find(); { "ts" : Timestamp(1421375729, 1), "h" : NumberLong(0), "v" : 2, "op" : "n", "ns" : "", "o" : { "msg" : "initiating set" } } rs1:PRIMARY>
字段说明:ts:某个操作的时间戳op:操作类型:如下:i:insertd:deleteu:updatens:命名空间,也就是操作的collection nameo:doucment的内容查看master的oplog的元数据信息:rs1:PRIMARY> db.printReplicationInfo(); configured oplog size: 990MB log length start to end: 0secs (0hrs) oplog first event time: Fri Jan 16 2015 10:35:29 GMT+0800 (CST) oplog last event time: Fri Jan 16 2015 10:35:29 GMT+0800 (CST) now: Fri Jan 16 2015 10:45:18 GMT+0800 (CST) rs1:PRIMARY>
字段说明:configured oplog size:配置的oplog文件大小。log length start to end:oplog日志的启用时间段。oplog first event time:第一个事务日志的产生时间。oplog last event time:最后一个事务日志的产生时间。now:现在的时间值。查看slave的同步状态:rs1:PRIMARY> db.printSlaveReplicationInfo(); source: localhost:28011 syncedTo: Thu Jan 01 1970 08:00:00 GMT+0800 (CST) 1421375729 secs (394826.59 hrs) behind the primary source: localhost:28012 syncedTo: Thu Jan 01 1970 08:00:00 GMT+0800 (CST) 1421375729 secs (394826.59 hrs) behind the primary rs1:PRIMARY>
字段说明:source:从库的IP以及端口syncedTo:目前的同步情况,延迟了多久等信息。
3.2、主从配置信息在local库中不仅有主从日志oplog集合,还有一个集合用于记录主从配置信息:system.replsetrs1:PRIMARY> db.system.replset.find(); { "_id" : "rs1", "version" : 1, "members" : [ { "_id" : 0, "host" : "localhost:28010" }, { "_id" : 1, "host" : "localhost:28011" }, { "_id" : 2, "host" : "localhost:28012" } ] } rs1:PRIMARY>
从这个集合中可以看出,Replica Sets的配置信息,也可以在任何一个成员实例上执行rs.conf()来查看配置信息。
3.3、Replica set测试写操作和查询操作测试分别从28010、28011、28012端口进行插入数据操作28010操作如下:[root@localhost bin]# ./mongo --port 28010 MongoDB shell version: 2.6.6 connecting to: 127.0.0.1:28010/test rs1:PRIMARY> db.student.insert({name:"zhangsan",age:20}); WriteResult({ "nInserted" : 1 }) <span style="background-color: rgb(255, 0, 0);">rs1:PRIMARY</span>> db.student.find(); { "_id" : ObjectId("54b87ca7f663c819d621d590"), "name" : "zhangsan", "age" : 20 } rs1:PRIMARY>
28011操作如下:[root@localhost bin]# ./mongo --port 28011 MongoDB shell version: 2.6.6 connecting to: 127.0.0.1:28011/test rs1:SECONDARY> show collections 2015-01-16T11:22:54.517+0800 error: { "$err" : "<span style="background-color: rgb(255, 102, 102);">not master and slaveOk=false</span>", "code" : 13435 } at src/mongo/shell/query.js:131
当查询的时候报错了,说明是个从库不能执行查询操作。此时应该让从库可读,通过setSlaveOk()方法即可让其可读rs1:SECONDARY> db.getMongo().setSlaveOk();
<span style="background-color: rgb(255, 0, 0);">rs1:SECONDARY</span>> show collections; student system.indexes rs1:SECONDARY>
此时便可以进行查询操作了。
在此要注意下连接到mongod服务之后,命令行开头变成了rs1.SECONDARY和rs1.PRIMARY,说明当前登录的rs1这个复制集得PRIMARY节点或者SECONDARY的节点。此时查询student的数据:rs1:SECONDARY> db.student.find(); { "_id" : ObjectId("54b883fb7bd891605d9c300f"), "name" : "zhangsan", "age" : 20 } rs1:SECONDARY>
28012端口操作如下:[root@localhost bin]# ./mongo --port 28012 MongoDB shell version: 2.6.6 connecting to: 127.0.0.1:28012/test rs1:SECONDARY> show collections 2015-01-16T11:27:04.747+0800 error: { "$err" : "not master and slaveOk=false", "code" : 13435 } at src/mongo/shell/query.js:131 rs1:SECONDARY> db.getMongo().setSlaveOk(); rs1:SECONDARY> show collections student system.indexes rs1:SECONDARY> db.student.find(); { "_id" : ObjectId("54b883fb7bd891605d9c300f"), "name" : "zhangsan", "age" : 20 } rs1:SECONDARY>
在28011端口上进行写操作:[root@localhost bin]# ./mongo --port 28011 MongoDB shell version: 2.6.6 connecting to: 127.0.0.1:28011/test rs1:SECONDARY> db.student.insert({name:"lisi",age:20}); WriteResult({ "writeError" : { "code" : undefined, "errmsg" : "not master" } })
此时提示不是master不能进行写操作,这跟前面两章节详细讲解Replica Set架构的相关原理相符合。同样在28012端口也是如此,验证了Replica Set只有PRIMARY才能接收所有的写操作,SECONDARY最多也就只有读操作,还需要通过db.getMongo().setSlaveOk()来进行设置才可以。
故障转移复制集比传统的Master-Slave有改进的地方就是他可以进行故障自动转移,如果我们停掉复制集中的一个成员,那么剩下成员会再自动选举一个成员作为PRIMARY,比如我们现在将当前的28010这个PRIMARY停掉,通过使用kill -2 PID的方式,如下:bye [root@localhost bin]# <span style="color:#ff0000;">ps aux|grep mongod</span> root <span style="background-color: rgb(255, 0, 0);"> 6658 </span> 0.8 3.7 3175956 37508 ? Sl 11:06 0:12 ./mongod --replSet rs1 --keyFile=/usr/local/mongodb/key/r0 --fork --port <span style="background-color: rgb(255, 0, 0);">28010</span> --dbpath=/usr/local/mongodb/r0 --logpath=/usr/local/mongodb/log/r0.log --logappend root 7461 0.7 3.7 3144172 37764 ? Sl 11:06 0:11 ./mongod --replSet rs1 --keyFile=/usr/local/mongodb/key/r1 --fork --port 28011 --dbpath=/usr/local/mongodb/r1 --logpath=/usr/local/mongodb/log/r1.log --logappend root 28166 0.6 3.8 3144152 38520 ? Sl 11:10 0:08 ./mongod --replSet rs1 --keyFile=/usr/local/mongodb/key/r2 --fork --port 28012 --dbpath=/usr/local/mongodb/r2 --logpath=/usr/local/mongodb/log/r2.log --logappend root 30833 0.0 0.0 103244 832 pts/1 S+ 11:31 0:00 grep mongod [root@localhost bin]# <span style="background-color: rgb(255, 0, 0);">kill -2 6658</span> [root@localhost bin]# ps aux|grep mongod root 7461 0.7 3.7 3158520 37960 ? Sl 11:06 0:11 ./mongod --replSet rs1 --keyFile=/usr/local/mongodb/key/r1 --fork --port 28011 --dbpath=/usr/local/mongodb/r1 --logpath=/usr/local/mongodb/log/r1.log --logappend root 28166 0.6 3.8 3154396 38616 ? Sl 11:10 0:08 ./mongod --replSet rs1 --keyFile=/usr/local/mongodb/key/r2 --fork --port 28012 --dbpath=/usr/local/mongodb/r2 --logpath=/usr/local/mongodb/log/r2.log --logappend root 30869 0.0 0.0 103244 832 pts/1 S+ 11:31 0:00 grep mongod [root@localhost bin]#
此时通过28011端口连接mongod服务并查看复制集状态rs1:PRIMARY> rs.status() { "set" : "rs1", "date" : ISODate("2015-01-16T03:33:15Z"), "myState" : 1, "members" : [ { "_id" : 0, "name" : "<span style="background-color: rgb(255, 0, 0);">localhost:28010</span>", "health" : 0, <span style="background-color: rgb(255, 0, 0);">"state" : 8,</span> <span style="background-color: rgb(255, 0, 0);">"stateStr" : "(not reachable/healthy)",</span> "uptime" : 0, "optime" : Timestamp(1421378555, 1), "optimeDate" : ISODate("2015-01-16T03:22:35Z"), "lastHeartbeat" : ISODate("2015-01-16T03:33:14Z"), "lastHeartbeatRecv" : ISODate("2015-01-16T03:31:50Z"), "pingMs" : 0 }, { "_id" : 1, "name" : <span style="color:#ff0000;">"localhost:28011</span>", "health" : 1, "state" : 1, <span style="background-color: rgb(255, 0, 0);">"stateStr" : "PRIMARY"</span>, "uptime" : 1608, "optime" : Timestamp(1421378555, 1), "optimeDate" : ISODate("2015-01-16T03:22:35Z"), "electionTime" : Timestamp(1421379114, 1), "electionDate" : ISODate("2015-01-16T03:31:54Z"), "self" : true }, { "_id" : 2, "name" : "localhost:28012", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 1358, "optime" : Timestamp(1421378555, 1), "optimeDate" : ISODate("2015-01-16T03:22:35Z"), "lastHeartbeat" : ISODate("2015-01-16T03:33:15Z"), "lastHeartbeatRecv" : ISODate("2015-01-16T03:33:13Z"), "pingMs" : 0, "lastHeartbeatMessage" : "syncing to: localhost:28011", "syncingTo" : "localhost:28011" } ], "ok" : 1 } rs1:PRIMARY>
此时28010的状态变为了8,描述为不可达。健康状态为0,28011的状态变为了1,描述为PRIMARY,此时的架构为如下所示:通过上述测试,系统在28010服务挂掉时,系统自动选举了28011端口作为PRIMARY服务,所以这样的故障处理机制,能将系统的稳定性大大的提高。此时便可以在28011上进行写入操作了,28012上仅有读取操作(在此不列出)。rs1:PRIMARY> use test switched to db test rs1:PRIMARY> db.student.insert({name:"lisi",age:20}); WriteResult({ "nInserted" : 1 }) rs1:PRIMARY> db.student.find(); { "_id" : ObjectId("54b883fb7bd891605d9c300f"), "name" : "zhangsan", "age" : 20 } { "_id" : ObjectId("54b8876aad5e04c1fe460154"), "name" : "lisi", "age" : 20 } rs1:PRIMARY>
基于Replica Set部署和测试在此章节到此结束,在后续的章节继续讲解Replica Set动态添加、删除节点的相关内容,在该章节中主要讲解Replica Set部署过程,以及故障自动转移和读写分离的相关测试和原理。--------------------------------------------MongoDB系列博文更新---------------------------