openfalcon-mongodb监控插件

下载地址

https://github.com/ZhuoRoger/mongomon

修改版本下载

mongodb自己连接测试

./mongo -u admin2 -p “123” –authenticationDatabase admin

配置 conf/mongomon.conf

items:
-  {port: 27017, user: "admin2",password: "123"}

程序 bin/mongodb_monitor.py调整

  • hostname调整,一般我喜欢调整成ip地址
  • 这里写图片描述

mongodb monitor cron

// root下面用这个 * * * * * user (cd /opt/abc/agent/falcon-plugins/mongomon/bin/ && python  mongodb_monitor.py  > /dev/null)
* * * * * cd /opt/abc/agent/falcon-plugins/mongomon/bin/ && python  mongodb_monitor.py  >> /dev/null

mongodb创建admin账户,要有root权限,否则不会被授权

use admin
db.createUser(
  {
  user: "admin2",
  pwd: "123",
  roles: [ { role: "root", db: "admin" } ]
  });
exit;

一个坑

  • 领导要求监控uptime,如果这个300分钟内没有变化就说明mongodb不工作了,需要报警,zabbix里面有这个插件,那么falcon是否也可以呢?
  • 调整之前
    这里写图片描述

  • 调整之后
    这里写图片描述
    经过调查发现,falcon里面对于uptime这个指标,是GAUGE类型,就是说,mongodb运行一分钟,我会有数值60,一个小时就是3600,运行个几天,数据大的不行,而且关键这个东西没法做报警,不是一个相对值,所以我稍微修改了下作者的代码,把uptime加到counter列表里面,这样,每次我们就会得到一个稳定的1,因为假设mongodb一分钟内一直在运行,两次踩点之间的差是60,然后除以60,最后得到1,也就是说,我们最终得到的值只可能是1左右,否则就是不工作。

又一个坑

这里写图片描述

  • 主要是,作者本来插件已经支持单机和replca-set的监控,但是在replica-set的监控中,由于引入了mongodb的 Timestamp,这个是bson里面的,必须安装python的一个插件,以及修改json处理中少了 [“ts”], 两个地方会导致最终采集错误,修改后的整体代码如下,只需要覆盖 mongodb_server.py即可。

  • 小伙伴们可以继续测试下,有问题请留言。

  • 不要忘了pip install mongo-connector 哦。 参考 http://nullege.com/codes/search/bson.timestamp.Timestamp
  • 总结下拍错思路,根据报错信息,然后加上type(something)去看到底是个啥鬼。
#! /bin/env python
#-*- coding:utf8 -*-
# Filename: mongodb_server.py

import sys
import os
from bson.timestamp import Timestamp

import pymongo
from pymongo import MongoClient

class mongodbMonitor(object):

    def mongodb_connect(self,host=None, port=None, user=None, password=None):
#       try:
            conn = MongoClient(host, port, serverSelectionTimeoutMS=1000)  # conntion timeout 1 sec. 

            if user and password:
                db_admin = conn["admin"]
                if not db_admin.authenticate(user,password):
                            pass;
            conn.server_info()  
            print conn
#       except :
#           e = sys.exc_info()[0]
#           return e, None

            return 0,conn

    #data node(1): standalone, replset primary, replset secondary. mongos(2), mongoConfigSrv(3)
    def get_mongo_role(self, conn):

        mongo_role = 1  
        conn.server_info() 
        if (conn.is_mongos):
            mongo_role = 2 
        elif ("chunks" in conn.get_database("config").collection_names()): #  Role is a config servers?  not mongos and has config.chunks collections. it's a config server.
            mongo_role = 3  
        return mongo_role

    def get_mongo_monitor_data(self, conn):

        mongo_monitor_dict ={}
        mongo_monitor_dict["mongo_local_alive"] = 1  # mongo local alive metric for all nodes.
                mongo_role = self.get_mongo_role(conn)

                if(mongo_role == 1):
                        mongodb_role,serverStatus_dict = self.serverStatus(conn)
                        mongo_monitor_dict.update(serverStatus_dict)
                        repl_status_dict = {}
                        if (mongodb_role == "master" or mongodb_role == "secondary"):
                                repl_status_dict = self.repl_status(conn)
                                mongo_monitor_dict.update(repl_status_dict)
            else:
                print "this is standalone node"
        elif(mongo_role == 2): # mongos
            shards_dict = self.shard_status(conn)
            mongo_monitor_dict.update(shards_dict)
        return mongo_monitor_dict

    def  serverStatus(self,connection):

        serverStatus = connection.admin.command(pymongo.son_manipulator.SON([('serverStatus', 1)]))

        mongodb_server_dict = {}  # mongodb server status metric for upload to falcon

        mongo_version = serverStatus["version"]
        #uptime metric
        mongodb_server_dict["uptime"] = int(serverStatus["uptime"])

        #asserts section metrics
        mongo_asserts = serverStatus["asserts"]
        for asserts_key in mongo_asserts.keys():
            asserts_key_name = "asserts_" + asserts_key
            mongodb_server_dict[asserts_key_name] = mongo_asserts[asserts_key]      

        ### "extra_info" section metrics: page_faults.  falcon counter type.
        if serverStatus.has_key("extra_info"):
            mongodb_server_dict["page_faults"] = serverStatus["extra_info"]["page_faults"]

        ### "connections" section metrics
        current_conn =  serverStatus["connections"]["current"]
        available_conn = serverStatus["connections"]["available"]

        mongodb_server_dict["connections_current"] = current_conn
        mongodb_server_dict["connections_available"] = available_conn

        # mongodb connection used percent 
        mongodb_server_dict["connections_used_percent"] = int((current_conn/(current_conn + available_conn)*100)) 

        # total created from mongodb started.  COUNTER metric
        mongodb_server_dict["connections_totalCreated"] =  serverStatus["connections"]["totalCreated"]


        #  "globalLock" currentQueue

        mongodb_server_dict["globalLock_currentQueue_total"] = serverStatus["globalLock"]["currentQueue"]["total"]
        mongodb_server_dict["globalLock_currentQueue_readers"] = serverStatus["globalLock"]["currentQueue"]["readers"]
        mongodb_server_dict["globalLock_currentQueue_writers"] = serverStatus["globalLock"]["currentQueue"]["writers"]

        # "locks" section, Changed in version 3.0
        if serverStatus.has_key("locks") and mongo_version >"3.0":
            locks_dict_keys = serverStatus["locks"].keys()
            for lock_scope in locks_dict_keys:  # Global, Database,Collection,Oplog
                for lock_metric  in serverStatus["locks"][lock_scope]:
                    for lock_type in serverStatus["locks"][lock_scope][lock_metric]:

                        if lock_type == "R":
                            lock_name = "Slock"
                        elif lock_type == "W":
                            lock_name = "Xlock"
                        elif lock_type == "r":
                            lock_name = "ISlock"
                        elif lock_type == "w":
                            lock_name = "IXlock"
                        lock_metric_key = "locks_" + lock_scope + "_" + lock_metric + "_" + lock_name       
                        mongodb_server_dict[lock_metric_key] =  serverStatus["locks"][lock_scope][lock_metric][lock_type]

        # "network" section metrics: bytesIn, bytesOut, numRequests;  counter type
        if serverStatus.has_key("network"):
            for network_metric in serverStatus["network"].keys():
                network_metric_key = "network_"  + network_metric   # network metric key for upload
                mongodb_server_dict[network_metric_key] = serverStatus["network"][network_metric]


        ### "opcounters" section metrics: insert, query, update, delete, getmore, command. couter type
        if serverStatus.has_key("opcounters"):
            for opcounters_metric in serverStatus["opcounters"].keys():
                opcounters_metric_key = "opcounters_" + opcounters_metric 
                mongodb_server_dict[opcounters_metric_key] = serverStatus["opcounters"][opcounters_metric]


        ### "opcountersRepl" section metrics: insert, query, update, delete, getmore, command. couter type
        if serverStatus.has_key("opcountersRepl"):
            for opcountersRepl_metric in serverStatus["opcountersRepl"].keys():
                opcountersRepl_metric_key = "opcountersRepl_" + opcountersRepl_metric 
                mongodb_server_dict[opcountersRepl_metric_key] = serverStatus["opcounters"][opcountersRepl_metric]


        ### "mem" section metrics: 
        if serverStatus.has_key("mem"):
            for mem_metric in serverStatus["mem"].keys():
                mem_metric_key = "mem_"  + mem_metric
                if( mem_metric in ["bits","supported"] ):
                    mongodb_server_dict[mem_metric_key] = serverStatus["mem"][mem_metric]
                else:
                    mongodb_server_dict[mem_metric_key] = serverStatus["mem"][mem_metric]*1024*1024

        ### "dur" section metrics:
        if serverStatus.has_key("dur"):
            mongodb_server_dict["dur_journaledBytes"] = serverStatus["dur"]["journaledMB"]*1024*1024
            mongodb_server_dict["dur_writeToDataFilesBytes"] = serverStatus["dur"]["writeToDataFilesMB"]*1024*1024
            mongodb_server_dict["dur_commitsInWriteLock"] = serverStatus["dur"]["commitsInWriteLock"]

        ### "repl" section
        mongodb_role = ""
        if (serverStatus.has_key("repl") and  serverStatus["repl"].has_key("secondary")):
            if serverStatus["repl"]["ismaster"]:
                mongodb_role = "master"
            if  serverStatus["repl"]["secondary"]:
                mongodb_role = "secondary"
        else: # not Replica sets mode
            mongodb_role = "standalone" 


        ### "backgroundFlushing" section metrics, only for MMAPv1
        if serverStatus.has_key("backgroundFlushing"):
            for bgFlush_metric in serverStatus["backgroundFlushing"].keys():
                if bgFlush_metric != "last_finished":  # discard last_finished metric 
                    bgFlush_metric_key = "backgroundFlushing_" + bgFlush_metric
                    mongodb_server_dict[bgFlush_metric_key] = serverStatus["backgroundFlushing"][bgFlush_metric]

        ### cursor from "metrics" section
        if serverStatus.has_key("metrics") and  serverStatus["metrics"].has_key("cursor"):
            cursor_status = serverStatus["metrics"]["cursor"]
            mongodb_server_dict["cursor_timedOut"] = cursor_status["timedOut"]  
            mongodb_server_dict["cursor_open_noTimeout"] =  cursor_status["open"]["noTimeout"]
            mongodb_server_dict["cursor_open_pinned"] =  cursor_status["open"]["pinned"]
            mongodb_server_dict["cursor_open_total"] =  cursor_status["open"]["total"]


        ### "wiredTiger" section 
        if serverStatus.has_key("wiredTiger"):
            serverStatus_wt = serverStatus["wiredTiger"]

            #cache 
            wt_cache = serverStatus_wt["cache"]
            mongodb_server_dict["wt_cache_used_total_bytes"] = wt_cache["bytes currently in the cache"]
            mongodb_server_dict["wt_cache_dirty_bytes"] = wt_cache["tracked dirty bytes in the cache"]
            mongodb_server_dict["wt_cache_readinto_bytes"] = wt_cache["bytes read into cache"]
            mongodb_server_dict["wt_cache_writtenfrom_bytes"] = wt_cache["bytes written from cache"]

            #concurrentTransactions
            wt_concurrentTransactions = serverStatus_wt["concurrentTransactions"]
            mongodb_server_dict["wt_concurrentTransactions_write"] = wt_concurrentTransactions["write"]["available"]
            mongodb_server_dict["wt_concurrentTransactions_read"] = wt_concurrentTransactions["read"]["available"]  

            #"block-manager" section
            wt_block_manager = serverStatus_wt["block-manager"]
            mongodb_server_dict["wt_bm_bytes_read"] = wt_block_manager["bytes read"]
            mongodb_server_dict["wt_bm_bytes_written"] = wt_block_manager["bytes written"]
            mongodb_server_dict["wt_bm_blocks_read"] = wt_block_manager["blocks read" ]
            mongodb_server_dict["wt_bm_blocks_written"] = wt_block_manager["blocks written"]

        ### "rocksdb" engine 
        if serverStatus.has_key("rocksdb"):
            serverStatus_rocksdb = serverStatus["rocksdb"]

            mongodb_server_dict["rocksdb_num_immutable_mem_table"]  = serverStatus_rocksdb["num-immutable-mem-table"]
            mongodb_server_dict["rocksdb_mem_table_flush_pending"] = serverStatus_rocksdb["mem-table-flush-pending"]
            mongodb_server_dict["rocksdb_compaction_pending"] = serverStatus_rocksdb["compaction-pending"]
                        mongodb_server_dict["rocksdb_background_errors"] = serverStatus_rocksdb["background-errors"]
                        mongodb_server_dict["rocksdb_num_entries_active_mem_table"] = serverStatus_rocksdb["num-entries-active-mem-table"]
                        mongodb_server_dict["rocksdb_num_entries_imm_mem_tables"] = serverStatus_rocksdb["num-entries-imm-mem-tables"]
                        mongodb_server_dict["rocksdb_num_snapshots"] = serverStatus_rocksdb["num-snapshots"]
                        mongodb_server_dict["rocksdb_oldest_snapshot_time"] = serverStatus_rocksdb["oldest-snapshot-time"]
                        mongodb_server_dict["rocksdb_num_live_versions"] = serverStatus_rocksdb["num-live-versions"]
            mongodb_server_dict["rocksdb_total_live_recovery_units"] = serverStatus_rocksdb["total-live-recovery-units"]

        ### "PerconaFT" engine
        if serverStatus.has_key("PerconaFT"):
            serverStatus_PerconaFT = serverStatus["PerconaFT"]

            mongodb_server_dict["PerconaFT_log_count"] = serverStatus_PerconaFT["log"]["count"]
            mongodb_server_dict["PerconaFT_log_time"] = serverStatus_PerconaFT["log"]["time"]
            mongodb_server_dict["PerconaFT_log_bytes"] = serverStatus_PerconaFT["log"]["bytes"]

            mongodb_server_dict["PerconaFT_fsync_count"] = serverStatus_PerconaFT["fsync"]["count"]
            mongodb_server_dict["PerconaFT_fsync_time"] =  serverStatus_PerconaFT["fsync"]["time"]

            ### cachetable
            PerconaFT_cachetable = serverStatus_PerconaFT["cachetable"]
            mongodb_server_dict["PerconaFT_cachetable_size_current"] = PerconaFT_cachetable["size"]["current"] 
            mongodb_server_dict["PerconaFT_cachetable_size_writing"]  = PerconaFT_cachetable["size"]["writing"]
                        mongodb_server_dict["PerconaFT_cachetable_size_limit"]  = PerconaFT_cachetable["size"]["limit"]


            ### PerconaFT checkpoint            
            PerconaFT_checkpoint = serverStatus_PerconaFT["checkpoint"]
            mongodb_server_dict["PerconaFT_checkpoint_count"] = PerconaFT_checkpoint["count"]
            mongodb_server_dict["PerconaFT_checkpoint_time"] = PerconaFT_checkpoint["time"]

            mongodb_server_dict["PerconaFT_checkpoint_write_nonleaf_count"] = PerconaFT_checkpoint["write"]["nonleaf"]["count"]
            mongodb_server_dict["PerconaFT_checkpoint_write_nonleaf_time"] = PerconaFT_checkpoint["write"]["nonleaf"]["time"]     
            mongodb_server_dict["PerconaFT_checkpoint_write_nonleaf_bytes_compressed"] = PerconaFT_checkpoint["write"]["nonleaf"]["bytes"]["compressed"]
            mongodb_server_dict["PerconaFT_checkpoint_write_nonleaf_bytes_uncompressed"] = PerconaFT_checkpoint["write"]["nonleaf"]["bytes"]["uncompressed"]  
                        mongodb_server_dict["PerconaFT_checkpoint_write_leaf_count"] = PerconaFT_checkpoint["write"]["leaf"]["count"]
                        mongodb_server_dict["PerconaFT_checkpoint_write_leaf_time"] = PerconaFT_checkpoint["write"]["leaf"]["time"]     
                        mongodb_server_dict["PerconaFT_checkpoint_write_leaf_bytes_compressed"] = PerconaFT_checkpoint["write"]["leaf"]["bytes"]["compressed"]
                        mongodb_server_dict["PerconaFT_checkpoint_write_leaf_bytes_uncompressed"] = PerconaFT_checkpoint["write"]["leaf"]["bytes"]["uncompressed"]  


            ### serializeTime

            for serializeTime_item  in serverStatus_PerconaFT["serializeTime"]:
                prefix = "PerconaFT_serializeTime_" + serializeTime_item
                for serializeTime_key in serverStatus_PerconaFT["serializeTime"][serializeTime_item]:
                    key_name = prefix + "_" + serializeTime_key
                        mongodb_server_dict[key_name] = serverStatus_PerconaFT["serializeTime"][serializeTime_item][serializeTime_key]  

            ### PerconaFT  compressionRatio
            for compressionRatio_item in serverStatus_PerconaFT["compressionRatio"]:
                key_name = "PerconaFT_compressionRatio_" + compressionRatio_item
                mongodb_server_dict[key_name] = serverStatus_PerconaFT["compressionRatio"][compressionRatio_item]

        return (mongodb_role, mongodb_server_dict)

        def repl_status(self,connection):
            replStatus = connection.admin.command("replSetGetStatus")
        print replStatus
                repl_status_dict = {}  # repl set metric dict

                # myState "1" for PRIMARY , "2" for  SECONDARY, "3":
                repl_status_dict["repl_myState"] = replStatus["myState"]

                repl_status_members = replStatus["members"]

                master_optime = 0 # Master oplog ops time
                myself_optime = 0 # SECONDARY oplog ops time

        print "开始打印repl_status_members"
        print repl_status_members
        print "结束打印repl_status_members"
                for repl_member in repl_status_members:
                        if repl_member.has_key("self") and repl_member["self"]:
                                repl_status_dict["repl_health"] = repl_member["health"]
                                #repl_status_dict["repl_optime"] = repl_member["optime"].time
                #print "value of optime ts is:"
                #print type(repl_member["optime"])
                #print type(repl_member["optime"]["ts"])
                print repl_member["optime"]["ts"].time
                                repl_status_dict["repl_optime"] = repl_member["optime"]["ts"].time
                                if repl_member.has_key("repl_electionTime"):
                    repl_status_dict["repl_electionTime"] = repl_member["electionTime"].time
                                if repl_member.has_key("repl_configVersion"):
                    repl_status_dict["repl_configVersion"] = repl_member["configVersion"]
                                #myself_optime = repl_member["optime"].time
                                myself_optime = repl_member["optime"]["ts"].time
                        if (replStatus["myState"] == 2 and repl_member["state"] == 1 ):  # CONDARY ,get repl lag
                                master_optime = repl_member["optime"]["ts"].time
                if replStatus["myState"] == 2 :

                        repl_status_dict["repl_lag"] = master_optime - myself_optime


                ### oplog window  hours

                oplog_collection = connection["local"]["oplog.rs"]

                oplog_tFirst =   oplog_collection.find({},{"ts":1}).sort('$natural',pymongo.ASCENDING).limit(1).next()
                oplog_tLast = oplog_collection.find({},{"ts":1}).sort('$natural',pymongo.DESCENDING).limit(1).next()


                oplogrs_collstats =   connection["local"].command("collstats", "oplog.rs")


                window_multiple = 1   ##oplog.rs collections is not full     
                if oplogrs_collstats.has_key("maxSize"):
                        window_multiple = oplogrs_collstats["maxSize"]/(oplogrs_collstats["count"] * oplogrs_collstats["avgObjSize"])
                else:
                        window_multiple =  oplogrs_collstats["storageSize"]/(oplogrs_collstats["count"] * oplogrs_collstats["avgObjSize"])

                #oplog_window  .xx hours
                oplog_window = round((oplog_tLast["ts"].time - oplog_tFirst["ts"].time)/3600.0,2) * window_multiple  # full


                repl_status_dict["repl_oplog_window"] = oplog_window

                return repl_status_dict

    # only for mongos node
    def shard_status(self, conn):  

        config_db = conn["config"]

            settings_col = config_db["settings"]

            balancer_doc = settings_col.find_one({'_id':'balancer'})

        shards_dict = {}
            if balancer_doc is  None:
                    shards_dict["shards_BalancerState"] = 1
            elif balancer_doc["stopped"]:  
                    shards_dict["shards_BalancerState"] = 0
            else: 
                    shards_dict["shards_BalancerState"] = 1

            # shards_activeWindow metric,0: without setting, 1:setting 
            # shards_activeWindow_start  metric,  { "start" : "23:30", "stop" : "6:00" } :  23.30 for  23:30 
            # shards_activeWindow_stop metric

            if balancer_doc is  None:
                    shards_dict["shards_activeWindow"] = 0

            elif balancer_doc.has_key("activeWindow"):
                    shards_dict["shards_activeWindow"] = 1
                    if balancer_doc["activeWindow"].has_key("start"):
                            window_start = balancer_doc["activeWindow"]["start"]
                            shards_dict["shards_activeWindow_start"] =  window_start.replace(":",".")

                    if balancer_doc["activeWindow"].has_key("stop"):
                            window_stop  = balancer_doc["activeWindow"]["stop"]
                            shards_dict["shards_activeWindow_stop"] = window_stop.replace(":",".")

            # shards_chunkSize metric
            chunksize_doc = settings_col.find_one({"_id" : "chunksize"})
            if chunksize_doc is not None:
                    shards_dict["shards_chunkSize"] = chunksize_doc["value"]

            # shards_isBalancerRunning metric
            locks_col = config_db["locks"]
            balancer_lock_doc = locks_col.find_one({'_id':'balancer'})

            if balancer_lock_doc is None:
                    print "config.locks collection empty or missing. be sure you are connected to a mongos"
                    shards_dict["shards_isBalancerRunning"] = 0
            elif balancer_lock_doc["state"] > 0:
                    shards_dict["shards_isBalancerRunning"] = 1
            else:
                    shards_dict["shards_isBalancerRunning"] = 0

            # shards_size metric  

            shards_col = config_db["shards"]
            shards_dict["shards_size"] = shards_col.count()

            # shards_mongosSize metric
            mongos_col = config_db["mongos"]
            shards_dict["shards_mongosSize"] = mongos_col.count()

        return shards_dict

有用的指标

uptime

uptime/mongo=27017

opcounters

opcounters_insert/mongo=27017
opcounters_query/mongo=27017
opcounters_update/mongo=27017
opcounters_delete/mongo=27017
opcounters_getmore/mongo=27017
opcounters_command/mongo=27017

connections

connections_available/mongo=27017
connections_current/mongo=27017
connections_totalCreated/mongo=27017
connections_used_percent/mongo=27017

附录

评论 18
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值