Mesos-日志和监控功能

Mesos-日志和监控功能

Mesos自身提供了强大的日志和监控功能,某些应用框架也提供了针对框架中任务的监控能力。
通过这些接口,用户可以实时获取集群的各种状态。
日志文件默认在/var/log/mesos目录下,根据日志等级带有不同的后缀。
用户可以通过日志来调试使用中碰到的问题。
一般推荐用–log_dir选项来指定日志存放路径,并通过日志分析引擎来进行监控。

Mesos提供了方便的监控接口,供用户查看集群中各个节点的状态。
下面分别介绍两种节点的监控。

主节点监控

(1)主节点
通过http://MASTER_NODE:5050/metrics/snapshot地址可以获取到Mesos主节点的各种状态统计信息,包括资源(CPU、硬盘、内存)使用、系统状态、从节点、应用框架、任务状态等。

wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -ivh epel-release-latest-7.noarch.rpm
yum repolist
yum install jq

[root@adson ~]# curl -s http://adson:5050/metrics/snapshot | jq .
{
“allocator/event_queue_dispatches”: 5,
“allocator/mesos/allocation_run_latency_ms”: 0.011008,
“allocator/mesos/allocation_run_latency_ms/count”: 1000,
“allocator/mesos/allocation_run_latency_ms/max”: 0.305152,
“allocator/mesos/allocation_run_latency_ms/min”: 0,
“allocator/mesos/allocation_run_latency_ms/p50”: 0.011008,
“allocator/mesos/allocation_run_latency_ms/p90”: 0.018176,
“allocator/mesos/allocation_run_latency_ms/p95”: 0.0220544,
“allocator/mesos/allocation_run_latency_ms/p99”: 0.05301248,
“allocator/mesos/allocation_run_latency_ms/p999”: 0.100045311999995,
“allocator/mesos/allocation_run_latency_ms/p9999”: 0.284641331200014,
“allocator/mesos/allocation_run_ms”: 0.208896,
“allocator/mesos/allocation_run_ms/count”: 1000,
“allocator/mesos/allocation_run_ms/max”: 1.617152,
“allocator/mesos/allocation_run_ms/min”: 0,
“allocator/mesos/allocation_run_ms/p50”: 0.169984,
“allocator/mesos/allocation_run_ms/p90”: 0.312064,
“allocator/mesos/allocation_run_ms/p95”: 0.3840512,
“allocator/mesos/allocation_run_ms/p99”: 0.77025792,
“allocator/mesos/allocation_run_ms/p999”: 1.426111232,
“allocator/mesos/allocation_run_ms/p9999”: 1.59804792320001,
“allocator/mesos/allocation_runs”: 16449,
“allocator/mesos/event_queue_dispatches”: 0,
“allocator/mesos/offer_filters/roles//active": 2,
“allocator/mesos/resources/cpus/offered_or_allocated”: 0.5,
“allocator/mesos/resources/cpus/total”: 2,
“allocator/mesos/resources/disk/offered_or_allocated”: 0,
“allocator/mesos/resources/disk/total”: 44224,
“allocator/mesos/resources/mem/offered_or_allocated”: 32,
“allocator/mesos/resources/mem/total”: 3921,
"allocator/mesos/roles/
/shares/dominant”: 0.25,
“master/cpus_percent”: 0.25,
“master/cpus_revocable_percent”: 0,
“master/cpus_revocable_total”: 0,
“master/cpus_revocable_used”: 0,
“master/cpus_total”: 2,
“master/cpus_used”: 0.5,
“master/disk_percent”: 0,
“master/disk_revocable_percent”: 0,
“master/disk_revocable_total”: 0,
“master/disk_revocable_used”: 0,
“master/disk_total”: 44224,
。。。。。。。。。。。。。。。。。。。
“master/messages_update_slave”: 57,
“master/outstanding_offers”: 0,
“master/recovery_slave_removals”: 0,
“master/slave_registrations”: 3,
“master/slave_removals”: 9,
“master/slave_removals/reason_registered”: 0,
“master/slave_removals/reason_unhealthy”: 9,
“master/slave_removals/reason_unregistered”: 0,
“master/slave_reregistrations”: 8,
“master/slave_shutdowns_canceled”: 0,
“master/slave_shutdowns_completed”: 0,
“master/slave_shutdowns_scheduled”: 0,
“master/slave_unreachable_canceled”: 0,
“master/slave_unreachable_completed”: 9,
“master/slave_unreachable_scheduled”: 9,
“master/slaves_active”: 2,
“master/slaves_connected”: 2,
“master/slaves_disconnected”: 0,
“master/slaves_inactive”: 0,
“master/slaves_unreachable”: 1,
“master/task_dropped/source_slave/reason_reconciliation”: 1,
“master/task_failed/source_slave/reason_container_launch_failed”: 5,
“master/task_failed/source_slave/reason_executor_terminated”: 17,
“master/task_unreachable/source_master/reason_slave_removed”: 1,
“master/tasks_dropped”: 1,
“master/tasks_error”: 0,
“master/tasks_failed”: 22,
“master/tasks_finished”: 158,
“master/tasks_gone”: 0,
“master/tasks_gone_by_operator”: 0,
“master/tasks_killed”: 0,
“master/tasks_killing”: 0,
“master/tasks_lost”: 0,
“master/tasks_running”: 1,
“master/tasks_staging”: 0,
“master/tasks_starting”: 0,
“master/tasks_unreachable”: 0,
“master/uptime_secs”: 16087.230494976,
“master/valid_executor_to_framework_messages”: 0,
“master/valid_framework_to_executor_messages”: 0,
“master/valid_operation_status_update_acknowledgements”: 0,
“master/valid_status_update_acknowledgements”: 507,
“master/valid_status_updates”: 507,
“registrar/log/ensemble_size”: 2,
“registrar/log/recovered”: 1,
“registrar/queued_operations”: 0,
“registrar/registry_size_bytes”: 501,
“registrar/state_fetch_ms”: 21.344,
“registrar/state_store_ms”: 31.341056,
“registrar/state_store_ms/count”: 21,
“registrar/state_store_ms/max”: 738.861824,
“registrar/state_store_ms/min”: 9.018112,
“registrar/state_store_ms/p50”: 24.0768,
“registrar/state_store_ms/p90”: 65.156096,
“registrar/state_store_ms/p95”: 94.180096,
“registrar/state_store_ms/p99”: 609.9254784,
“registrar/state_store_ms/p999”: 725.96818944,
“registrar/state_store_ms/p9999”: 737.572460544001,
“system/cpus_total”: 1,
“system/load_15min”: 0.2,
“system/load_1min”: 0.55,
“system/load_5min”: 0.26,
“system/mem_free_bytes”: 215691264,
“system/mem_total_bytes”: 4137840640
}

从节点监控

(2)从节点
通过http://SLAVE_NODE:5051/metrics/snapshot地址可以获取到Mesos从节点的各种状态统计信息,包括资源、系统状态、各种消息状态等。
[root@docker1 ~]# curl -s http://192.168.6.32:5051/metrics/snapshot |jq .
{
“containerizer/fetcher/cache_size_total_bytes”: 2147483648,
“containerizer/fetcher/cache_size_used_bytes”: 0,
“containerizer/fetcher/task_fetches_failed”: 0,
“containerizer/fetcher/task_fetches_succeeded”: 28,
“containerizer/mesos/container_destroy_errors”: 0,
“containerizer/mesos/provisioner/bind/remove_rootfs_errors”: 0,
“containerizer/mesos/provisioner/remove_container_errors”: 0,
“gc/path_removals_failed”: 0,
“gc/path_removals_pending”: 0,
“gc/path_removals_succeeded”: 205,
“resource_provider_manager/subscribed”: 0,
“slave/container_launch_errors”: 5,
“slave/cpus_percent”: 0.6,
“slave/cpus_revocable_percent”: 0,
“slave/cpus_revocable_total”: 0,
“slave/cpus_revocable_used”: 0,
“slave/cpus_total”: 1,
“slave/cpus_used”: 0.6,
“slave/disk_percent”: 0,
“slave/disk_revocable_percent”: 0,
“slave/disk_revocable_total”: 0,
“slave/disk_revocable_used”: 0,
“slave/disk_total”: 3167,
“slave/disk_used”: 0,
“slave/executor_directory_max_allowed_age_secs”: 0,
“slave/executors_preempted”: 0,
“slave/executors_registering”: 0,
“slave/executors_running”: 1,
“slave/executors_terminated”: 77,
“slave/executors_terminating”: 0,
“slave/frameworks_active”: 1,
“slave/gpus_percent”: 0,
“slave/gpus_revocable_percent”: 0,
“slave/gpus_revocable_total”: 0,
“slave/gpus_revocable_used”: 0,
“slave/gpus_total”: 0,
“slave/gpus_used”: 0,
“slave/invalid_framework_messages”: 0,
“slave/invalid_status_updates”: 0,
“slave/mem_percent”: 0.0640640640640641,
“slave/mem_revocable_percent”: 0,
“slave/mem_revocable_total”: 0,
“slave/mem_revocable_used”: 0,
“slave/mem_total”: 999,
“slave/mem_used”: 64,
“slave/recovery_errors”: 0,
“slave/recovery_time_secs”: 0.287320064,
“slave/registered”: 1,
“slave/tasks_failed”: 5,
“slave/tasks_finished”: 72,
“slave/tasks_gone”: 0,
“slave/tasks_killed”: 0,
“slave/tasks_killing”: 0,
“slave/tasks_lost”: 0,
“slave/tasks_running”: 1,
“slave/tasks_staging”: 0,
“slave/tasks_starting”: 0,
“slave/uptime_secs”: 10874.96821888,
“slave/valid_framework_messages”: 0,
“slave/valid_status_updates”: 73,
“system/cpus_total”: 1,
“system/load_15min”: 0.05,
“system/load_1min”: 0.02,
“system/load_5min”: 0.07,
“system/mem_free_bytes”: 162828288,
“system/mem_total_bytes”: 2095820800
}

另外,通过http://MASTER_NODE:5050/monitor/statistics.json地址可以看到该从节点上容器网络相关的统计数据,包括进出流量、丢包数、队列情况等。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值