There is a good blog on this.
[color=red][size=medium]http://www.cloudera.com/blog/2009/03/hadoop-metrics/[/size][/color]
The HDFS and MapReduce daemons collect information about events and measurements
that are collectively known as metrics. For example, datanodes collect the following
metrics (and many more): the number of bytes written, the number of blocks
replicated, and the number of read requests from clients (both local and remote).
Metrics belong to a context, and Hadoop currently uses “dfs”, “mapred”, “rpc”, and
“jvm” contexts. Hadoop daemons usually collect metrics under several contexts. For
example, datanodes collect metrics for the “dfs”, “rpc”, and “jvm” contexts.
[size=medium]
[color=red]How Do Metrics Differ from Counters?[/color][/size]
The main difference is their scope: metrics are collected by Hadoop daemons, whereas
counters are collected for MapReduce tasks and aggregated
for the whole job. They have different audiences, too: broadly speaking, metrics
are for administrators, and counters are for MapReduce users.
The way they are collected and aggregated is also different. Counters are a MapReduce
feature, and the MapReduce system ensures that counter values are propagated from
the tasktrackers where they are produced, back to the jobtracker, and finally back to
the client running the MapReduce job. (Counters are propagated via RPC heartbeats;) Both the tasktrackers and the jobtracker
perform aggregation.
The collection mechanism for metrics is decoupled from the component that receives
the updates, and there are various pluggable outputs, including local files, Ganglia, and
JMX. The daemon collecting the metrics performs aggregation on them before they are
sent to the output.
Metrics are configured in the conf/hadoopmetrics.properties file
MetricsContext types
NullContext
FileContext
GangliaContext
NullContextWithUpdateThread
CompositeContext
[color=red][size=medium]http://www.cloudera.com/blog/2009/03/hadoop-metrics/[/size][/color]
The HDFS and MapReduce daemons collect information about events and measurements
that are collectively known as metrics. For example, datanodes collect the following
metrics (and many more): the number of bytes written, the number of blocks
replicated, and the number of read requests from clients (both local and remote).
Metrics belong to a context, and Hadoop currently uses “dfs”, “mapred”, “rpc”, and
“jvm” contexts. Hadoop daemons usually collect metrics under several contexts. For
example, datanodes collect metrics for the “dfs”, “rpc”, and “jvm” contexts.
[size=medium]
[color=red]How Do Metrics Differ from Counters?[/color][/size]
The main difference is their scope: metrics are collected by Hadoop daemons, whereas
counters are collected for MapReduce tasks and aggregated
for the whole job. They have different audiences, too: broadly speaking, metrics
are for administrators, and counters are for MapReduce users.
The way they are collected and aggregated is also different. Counters are a MapReduce
feature, and the MapReduce system ensures that counter values are propagated from
the tasktrackers where they are produced, back to the jobtracker, and finally back to
the client running the MapReduce job. (Counters are propagated via RPC heartbeats;) Both the tasktrackers and the jobtracker
perform aggregation.
The collection mechanism for metrics is decoupled from the component that receives
the updates, and there are various pluggable outputs, including local files, Ganglia, and
JMX. The daemon collecting the metrics performs aggregation on them before they are
sent to the output.
Metrics are configured in the conf/hadoopmetrics.properties file
MetricsContext types
NullContext
FileContext
GangliaContext
NullContextWithUpdateThread
CompositeContext

本文介绍了Hadoop中收集的各类指标,包括DataNode收集的写入字节数、块复制数量等,并对比了指标与计数器的区别。指标由Hadoop守护进程收集,而计数器则用于MapReduce任务并汇总整个作业。此外还介绍了指标的收集和聚合机制。
6923

被折叠的 条评论
为什么被折叠?



