Controlling User Logging in Hadoop

本文探讨了Hadoop集群中MapReduce任务尝试日志导致磁盘空间迅速耗尽的问题,并介绍了三种防止磁盘溢出的方法:通过配置参数限制日志保留时间、限制日志文件大小以及调整日志级别。

Imagine that you’re a Hadoop administrator, and to make things interesting you’re managing a multi-tenant Hadoop cluster where data scientists, developers and QA are pounding your cluster. One day you notice that your disks are filling-up fast, and after some investigating you realize that the root cause is your MapReduce task attempt logs.

How do you guard against this sort of thing happening? Before we get to that we need to understand where these files exist, and how they’re written. The figure below shows the three log files that are created for each task attempt in MapReduce. Notice that the logs are written to the local disk of the task attempt.

parition

OK, so how does Hadoop normally make sure that our disks don’t fill-up with these task attempt logs? I’ll cover three approaches.

Approach 1: mapred.userlog.retain.hours

Hadoop has a mapred.userlog.retain.hours configurable, which is defined in mapred-default.xml as:

The maximum time, in hours, for which the user-logs are to be retained after the job completion.

Great, but what if your disks are filling up before Hadoop has had a chance to automatically clean them up? It may be tempting to reduce mapred.userlog.retain.hours to a smaller value, but before you do that you should know that there’s a bug with the Hadoop versions 1.x and earlier (see MAPREDUCE-158), where the logs for long-running jobs that run longer than mapred.userlog.retain.hours are accidentally deleted. So maybe we should look elsewhere to solve our overflowing logs problem.

Approach 2: mapred.userlog.limit.kb

Hadoop has another configurable, mapred.userlog.limit.kb, which can be used to limit the file size ofstdlog, which is the log4j log output file. Let’s peek again at the documentation:

The maximum size of user-logs of each task in KB. 0 disables the cap.

The default value is 0, which means that log writes go straight to the log file. So all we need to do is to set a non-negative value and we’re set, right? Not so fast - it turns out that this approach has two disadvantages:

  1. Hadoop and user logs are actually cached in memory, so you’re taking away mapred.userlog.limit.kbkilobytes worth of memory from your task attempt’s process.
  2. Logs are only written out when the task attempt process has completed, and only contain the lastmapred.userlog.limit.kb worth of log entries, so this can make it challenging to debug long-running tasks.

OK, so what else can we try? We have one more solution, log levels.

Approach 3: Changing log levels

Ideally all your Hadoop users got the memo about minimizing excessive logging. But the reality of the situation is that you have limited control over what users decide to log in their code, but what you do have control over is the task attempt log levels.

If you had a MapReduce job that was aggressively logging in package com.example.mr, then you may be tempted to use the daemonlog CLI to connect to all the TaskTracker daemons and change the logging to ERROR level:

hadoop daemonlog -setlevel <host:port> com.example.mr ERROR

Yet again we hit a roadblock - this will only change the logging level for the TaskTracker process, and not for the task attempt process. Drat! This really only leaves one option, which is to update your${HADOOP_HOME}/conf/log4j.properties on all your data nodes by adding the following line to this file:

log4j.logger.com.example.mr=ERROR

The great thing about this change is that you don’t need to restart MapReduce, since any new task attempt processes will pick up your changes to log4j.properties.

<think>好的,我现在要解答用户关于“void-type in controlling expression”的问题。首先,我得理解用户的问题到底在问什么。控制表达式通常指的是条件语句(如if、while等)中的表达式部分。而void类型在C、C++等编程语言中通常表示没有返回值。所以用户可能在问,当控制表达式的类型是void时会发生什么,或者为什么会发生这种情况。 接下来,我需要回忆相关的编程知识。在C/C++中,void类型的表达式通常出现在函数没有返回值的时候。例如,如果有一个函数声明为void func(),那么调用这个函数的结果就是void类型。如果用户试图将这个函数调用作为控制表达式,比如if(func()) {...},这时候就会出现错误,因为控制表达式需要一个可以转换为布尔值的类型,而void无法转换。 此外,可能需要考虑其他情况,比如误用了逗号运算符或者其他操作导致表达式类型为void。例如,在C++中,如果写if (x=5, voidFunc()),这里逗号运算符的结果是最后一个表达式的类型,也就是void,同样会导致错误。 然后,我需要考虑用户可能的错误场景。例如,用户可能错误地在条件中调用了无返回值的函数,或者误用了某些操作符导致表达式类型变为void。这时候编译器会报错,提示无法将void类型用于条件表达式。 接下来,我需要组织回答的结构。首先解释void类型在控制表达式中的问题,然后举例说明常见的错误情况,接着解释编译器的行为,最后给出解决方法。这样用户能够逐步理解问题所在,并知道如何修正。 同时,要确保回答符合用户的要求:使用中文,结构清晰,分步解释,使用正确的LaTeX格式。例如,代码示例需要用行内公式的格式,或者独立段落,但这里可能更合适用代码块,不过用户要求用LaTeX,可能需要调整。不过根据系统指令,用户可见层可能不需要严格的LaTeX代码块,而是自然的中文解释加上适当的技术术语。 还需要注意,用户可能不太清楚术语,所以需要适当解释void类型和控制表达式的定义,但不要过于冗长。重点放在错误原因和解决方法上。 最后,检查是否有遗漏的情况,比如C和C++中的差异,或者其他语言是否也存在类似问题。但根据问题中的“void-type”,可能主要针对C/C++,所以可以集中在这些语言上。</think>在编程中,**void类型用于控制表达式**(如`if`、`while`、`for`的条件部分)时,会引发编译错误。以下是逐步解释: --- ### 1. **void类型的本质** - **void**表示“无类型”或“无返回值”,常见于函数声明: ```c void func() { ... } // 函数无返回值 ``` - 任何**void类型的表达式**(如调用无返回值的函数)本身不产生值。 --- ### 2. **控制表达式的要求** 控制语句(`if`/`while`/`for`等)的条件表达式需满足: - 必须能隐式转换为**布尔值**(`true`/`false`)。 - 例如: ```c if (x > 0) { ... } // 合法:x>0返回布尔值 ``` --- ### 3. **void类型导致的错误** 若控制表达式为`void`类型,编译器会报错: ```c if (func()) { ... } // 错误:func()返回void,无法作为条件 ``` - **原因**:`void`无法转换为布尔值,条件判断失去逻辑依据。 --- ### 4. **常见场景** #### 场景1:误用无返回值函数 ```c void check() { ... } if (check()) { ... } // 错误:check()返回void ``` **修复**:修改函数使其返回布尔值: ```c bool check() { return ...; } ``` #### 场景2:错误使用逗号运算符 ```c if (printf("Hello"), void_func()) { ... } // 错误:逗号表达式最终类型为void ``` **修复**:确保最后一个表达式有返回值: ```c if (printf("Hello"), 1) { ... } // 合法:1为整型,可转换为布尔值 ``` --- ### 5. **编译器行为** - **C/C++编译器**会直接报错,例如: ```plaintext error: void value not ignored as it ought to be ``` - **Java/C#等语言**:语法直接禁止void类型出现在条件中。 --- ### 6. **总结** - **根本问题**:`void`类型无法提供条件判断所需的逻辑值。 - **解决方法**: 1. 检查函数返回值类型。 2. 确保控制表达式最终为布尔值或可隐式转换的类型(如整型、指针)。 通过以上分析,可以避免因`void`类型导致的条件表达式错误。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值