DataNode自定义监控

本文介绍了Hadoop DataNode的重要性能指标,包括数据读写操作、缓存使用情况、JVM垃圾回收统计及磁盘空间使用等关键信息,为优化Hadoop集群提供了重要参考。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

采集路径:http://XXXXX:50075/jmx?qry=Hadoop:service=DataNode,name=*

一、DataNode性能信息(核心指标)

Hadoop:service=DataNode,name=DataNodeActivity-R720ip67-50010
MetricType(GAUGE,COUNTER)类型业务意义备注
BytesWritten
COUNTER
 Total number of bytes written to DataNode 
BytesReadCOUNTER Total number of bytes read from DataNode 
BlocksWrittenCOUNTER Total number of blocks written to DataNode 
BlocksReadCOUNTER Total number of blocks read from DataNode 
BlocksReplicatedCOUNTER Total number of blocks replicated 
BlocksRemovedCOUNTER Total number of blocks removed 
BlocksVerifiedCOUNTER Total number of blocks verified 
BlockVerificationFailuresCOUNTER Total number of verifications failures 
BlocksCachedGAUGE Total number of blocks cached 
BlocksUncachedGAUGE Total number of blocks uncached 
ReadsFromLocalClientCOUNTER Total number of read operations from local client 
ReadsFromRemoteClientCOUNTER Total number of read operations from remote client 
WritesFromLocalClientCOUNTER Total number of write operations from local client 
WritesFromRemoteClientCOUNTER Total number of write operations from remote client 
BlocksGetLocalPathInfoCOUNTER Total number of operations to get local path names of blocks 
FsyncCountCOUNTER Total number of fsync 
VolumeFailuresCOUNTER Total number of volume failures occurred 
ReadBlockOpNumOpsCOUNTER Total number of read operations 
ReadBlockOpAvgTimeGAUGEmsAverage time of read operations in milliseconds 
WriteBlockOpNumOpsCOUNTER Total number of write operations 
WriteBlockOpAvgTimeGAUGEmsAverage time of write operations in milliseconds 
BlockChecksumOpNumOpsCOUNTER Total number of blockChecksum operations 
BlockChecksumOpAvgTimeGAUGEmsAverage time of blockChecksum operations in milliseconds 
CopyBlockOpNumOpsCOUNTER Total number of block copy operations 
CopyBlockOpAvgTimeGAUGEmsAverage time of block copy operations in milliseconds 
ReplaceBlockOpNumOpsCOUNTER Total number of block replace operations 
ReplaceBlockOpAvgTimeGAUGEmsAverage time of block replace operations in milliseconds 
HeartbeatsNumOpsCOUNTER Total number of heartbeats 
HeartbeatsAvgTimeGAUGEmsAverage heartbeat time in milliseconds 
BlockReportsNumOpsCOUNTER Total number of block report operations 
BlockReportsAvgTimeGAUGEmsAverage time of block report operations in milliseconds 
CacheReportsNumOpsCOUNTER Total number of cache report operations 
CacheReportsAvgTimeGAUGEmsAverage time of cache report operations in milliseconds 
PacketAckRoundTripTimeNanosNumOpsCOUNTER Total number of ack round trip 
PacketAckRoundTripTimeNanosAvgTimeGAUGEmsAverage time from ack send to receive minus the downstream ack time in nanoseconds 
FlushNanosNumOpsCOUNTER Total number of flushes 
FlushNanosAvgTimeGAUGEmsAverage flush time in nanoseconds 
FsyncNanosNumOpsCOUNTER Total number of fsync 
FsyncNanosAvgTimeGAUGEmsAverage fsync time in nanoseconds 
SendDataPacketBlockedOnNetworkNanosNumOpsCOUNTER Total number of sending packets 
SendDataPacketBlockedOnNetworkNanosAvgTimeGAUGEmsAverage waiting time of sending packets in nanoseconds 
SendDataPacketTransferNanosNumOpsCOUNTER Total number of sending packets 
SendDataPacketTransferNanosAvgTimeGAUGEmsAverage transfer time of sending packets in nanoseconds 

 

二、DataNode JvmMetrics详细信息(核心指标)

Hadoop:service=DataNode,name=JvmMetrics
MetricType(GAUGE,COUNTER)类型业务意义备注
GcCountParNew
COUNTER
 新生代GC次数 
GcTimeMillisParNew
COUNTER
ms新生代GC耗时(ms) 
GcCountConcurrentMarkSweep
COUNTER
 老年代GC次数 
GcTimeMillisConcurrentMarkSweep
COUNTER
ms老年代GC耗时(ms) 
GcCount
COUNTER
 总的GC次数 
GcTimeMillis
COUNTER
ms总的GC耗时(ms) 

 

三、DataNode空间信息(核心指标)

Hadoop:service=DataNode,name=FSDatasetState-null
MetricType(GAUGE,COUNTER)类型业务意义备注
Capacity
GAUGE
GAUGE DN容量 
DfsUsed GAUGEGAUGE DN已经使用的容量 
NumFailedVolumes
 
 GAUGEGAUGEDN坏卷的个数  


对于Prometheus监控Hadoop集群,你可以使用以下方法: 1. 安装Prometheus:首先,你需要在你的监控服务器上安装和配置Prometheus。你可以从Prometheus官方网站下载二进制文件或者使用包管理工具进行安装。 2. 配置Prometheus:在Prometheus的配置文件中,你需要定义Hadoop集群的监控目标。你可以使用Prometheus的目标发现功能(如Service Discovery)或者手动配置Hadoop的各个组件的监控指标。 3. 配置Hadoop组件的指标:Hadoop的各个组件(如NameNode、DataNode、ResourceManager、NodeManager等)可以通过JMX(Java Management Extensions)暴露出各种监控指标。你需要在Hadoop的配置中启用JMX,并且确保Prometheus能够访问这些指标。 4. 使用Exporter:为了将Hadoop的JMX指标暴露给Prometheus,你可以使用现有的Exporter工具,如JMX Exporter或者Node Exporter。这些工具可以将JMX指标转换为Prometheus可识别的格式。 5. 配置Prometheus的监控规则和报警:一旦Prometheus开始收集Hadoop的监控指标,你可以使用PromQL查询语言定义自定义监控规则和报警。这样,当某些指标达到预设的阈值时,Prometheus将会触发报警。 6. 可视化和报告:除了Prometheus自带的基本监控界面外,你还可以使用Grafana等工具来可视化Hadoop的监控数据并生成报告。Grafana可以与Prometheus进行集成,并提供更丰富的图形化展示和报告功能。 请注意,这只是一个大致的过程概述,实际配置和使用过程可能因具体环境和需求而有所不同。你可能需要参考相关文档和资源来完成具体的配置和调整。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值