hadoop运维--ha抛出journalnode can not write

本文针对hadoop版本cdh4.3.2,在高可用(HA)模式下遇到journalnode无法写入的异常进行了描述,并提供了解决方案:将正常JournalNode的数据拷贝至故障节点的journaldata目录,重启服务即可修复问题。

hadoop版本cdh4.3.2


异常描述

journalnode提示不能写入,后端抛异常
1.6.232:50854: error: org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: Journal Storage Directory /data/hadoop/journalnode/journaldata/jn/mycluster not formatted
org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: Journal Storage Directory /data/hadoop/journalnode/journaldata/jn/mycluster not formatted
        at org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:451)
        at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:634)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:177)
        at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:196)
        at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)


解决方案

拷贝另外一台正常journalnode到指定的journaldata目录,重启服务,done




我的自建hadoop集群,3台master和4台core,我在部署完后发现master1的hadoop/logs下的log文件不知道什么原因不写入了,只有out文件在写。正常是out文件很小log文件比较大对吧[ec2-user@hadoop-master01 logs]$ ll total 51876 -rw-rw-r-- 1 ec2-user ec2-user 1019984 Sep 28 18:27 hadoop-ec2-user-historyserver-hadoop-master01.log -rw-rw-r-- 1 ec2-user ec2-user 130399 Oct 16 09:56 hadoop-ec2-user-historyserver-hadoop-master01.out -rw-rw-r-- 1 ec2-user ec2-user 45247 Oct 15 16:22 hadoop-ec2-user-historyserver-hadoop-master01.out.1 -rw-rw-r-- 1 ec2-user ec2-user 51151 Oct 15 16:17 hadoop-ec2-user-historyserver-hadoop-master01.out.2 -rw-rw-r-- 1 ec2-user ec2-user 116382 Oct 15 14:05 hadoop-ec2-user-historyserver-hadoop-master01.out.3 -rw-rw-r-- 1 ec2-user ec2-user 46976 Oct 14 14:33 hadoop-ec2-user-historyserver-hadoop-master01.out.4 -rw-rw-r-- 1 ec2-user ec2-user 46688 Oct 14 13:53 hadoop-ec2-user-historyserver-hadoop-master01.out.5 -rw-rw-r-- 1 ec2-user ec2-user 12621739 Sep 28 18:28 hadoop-ec2-user-journalnode-hadoop-master01.log -rw-rw-r-- 1 ec2-user ec2-user 1339323 Oct 16 09:58 hadoop-ec2-user-journalnode-hadoop-master01.out -rw-rw-r-- 1 ec2-user ec2-user 257238 Oct 15 18:11 hadoop-ec2-user-journalnode-hadoop-master01.out.1 -rw-rw-r-- 1 ec2-user ec2-user 65845 Oct 15 16:22 hadoop-ec2-user-journalnode-hadoop-master01.out.2 -rw-rw-r-- 1 ec2-user ec2-user 66349 Oct 15 16:17 hadoop-ec2-user-journalnode-hadoop-master01.out.3 -rw-rw-r-- 1 ec2-user ec2-user 55910 Oct 15 15:57 hadoop-ec2-user-journalnode-hadoop-master01.out.4 -rw-rw-r-- 1 ec2-user ec2-user 3240752 Oct 15 15:49 hadoop-ec2-user-journalnode-hadoop-master01.out.5 -rw-rw-r-- 1 ec2-user ec2-user 1900111 Oct 16 09:57 hadoop-ec2-user-namenode-hadoop-master01.out -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 18:09 hadoop-ec2-user-namenode-hadoop-master01.out.1 -rw-rw-r-- 1 ec2-user ec2-user 240450 Oct 15 18:09 hadoop-ec2-user-namenode-hadoop-master01.out.2 -rw-rw-r-- 1 ec2-user ec2-user 241594 Oct 15 18:05 hadoop-ec2-user-namenode-hadoop-master01.out.3 -rw-rw-r-- 1 ec2-user ec2-user 236717 Oct 15 17:57 hadoop-ec2-user-namenode-hadoop-master01.out.4 -rw-rw-r-- 1 ec2-user ec2-user 198509 Oct 15 17:40 hadoop-ec2-user-namenode-hadoop-master01.out.5 -rw-rw-r-- 1 ec2-user ec2-user 10266208 Sep 28 18:27 hadoop-ec2-user-resourcemanager-hadoop-master01.log -rw-rw-r-- 1 ec2-user ec2-user 765717 Oct 16 09:23 hadoop-ec2-user-resourcemanager-hadoop-master01.out -rw-rw-r-- 1 ec2-user ec2-user 577288 Oct 15 16:22 hadoop-ec2-user-resourcemanager-hadoop-master01.out.1 -rw-rw-r-- 1 ec2-user ec2-user 572997 Oct 15 16:17 hadoop-ec2-user-resourcemanager-hadoop-master01.out.2 -rw-rw-r-- 1 ec2-user ec2-user 573291 Oct 15 15:33 hadoop-ec2-user-resourcemanager-hadoop-master01.out.3 -rw-rw-r-- 1 ec2-user ec2-user 727212 Oct 15 15:20 hadoop-ec2-user-resourcemanager-hadoop-master01.out.4 -rw-rw-r-- 1 ec2-user ec2-user 848229 Oct 15 15:05 hadoop-ec2-user-resourcemanager-hadoop-master01.out.5 -rw-rw-r-- 1 ec2-user ec2-user 482956 Sep 28 18:27 hadoop-ec2-user-timelineserver-hadoop-master01.log -rw-rw-r-- 1 ec2-user ec2-user 2669030 Oct 16 09:26 hadoop-ec2-user-timelineserver-hadoop-master01.out -rw-rw-r-- 1 ec2-user ec2-user 1069795 Oct 15 16:22 hadoop-ec2-user-timelineserver-hadoop-master01.out.1 -rw-rw-r-- 1 ec2-user ec2-user 1076432 Oct 15 16:17 hadoop-ec2-user-timelineserver-hadoop-master01.out.2 -rw-rw-r-- 1 ec2-user ec2-user 1070341 Oct 15 15:33 hadoop-ec2-user-timelineserver-hadoop-master01.out.3 -rw-rw-r-- 1 ec2-user ec2-user 1064610 Oct 15 15:20 hadoop-ec2-user-timelineserver-hadoop-master01.out.4 -rw-rw-r-- 1 ec2-user ec2-user 3670854 Oct 15 15:06 hadoop-ec2-user-timelineserver-hadoop-master01.out.5 -rw-rw-r-- 1 ec2-user ec2-user 5036578 Sep 28 18:28 hadoop-ec2-user-zkfc-hadoop-master01.log -rw-rw-r-- 1 ec2-user ec2-user 99301 Oct 15 18:12 hadoop-ec2-user-zkfc-hadoop-master01.out -rw-rw-r-- 1 ec2-user ec2-user 195445 Oct 15 18:11 hadoop-ec2-user-zkfc-hadoop-master01.out.1 -rw-rw-r-- 1 ec2-user ec2-user 80419 Oct 15 16:22 hadoop-ec2-user-zkfc-hadoop-master01.out.2 -rw-rw-r-- 1 ec2-user ec2-user 79427 Oct 15 16:17 hadoop-ec2-user-zkfc-hadoop-master01.out.3 -rw-rw-r-- 1 ec2-user ec2-user 80419 Oct 15 15:57 hadoop-ec2-user-zkfc-hadoop-master01.out.4 -rw-rw-r-- 1 ec2-user ec2-user 79427 Oct 15 15:49 hadoop-ec2-user-zkfc-hadoop-master01.out.5 -rw-rw-r-- 1 ec2-user ec2-user 0 Sep 17 11:54 SecurityAuth-ec2-user.audit [ec2-user@hadoop-master01 logs]$ pwd /data/module/hadoop-3.3.4/logs [ec2-user@hadoop-master02 hadoop-3.3.4]$ cd logs/ [ec2-user@hadoop-master02 logs]$ ll total 142064 -rw-rw-r-- 1 ec2-user ec2-user 41175348 Oct 16 09:58 hadoop-ec2-user-journalnode-hadoop-master02.log -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 18:12 hadoop-ec2-user-journalnode-hadoop-master02.out -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 16:23 hadoop-ec2-user-journalnode-hadoop-master02.out.1 -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 16:18 hadoop-ec2-user-journalnode-hadoop-master02.out.2 -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 15:57 hadoop-ec2-user-journalnode-hadoop-master02.out.3 -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 15:49 hadoop-ec2-user-journalnode-hadoop-master02.out.4 -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 13 17:36 hadoop-ec2-user-journalnode-hadoop-master02.out.5 -rw-rw-r-- 1 ec2-user ec2-user 87261368 Oct 16 09:57 hadoop-ec2-user-namenode-hadoop-master02.log -rw-rw-r-- 1 ec2-user ec2-user 6371 Oct 15 18:13 hadoop-ec2-user-namenode-hadoop-master02.out -rw-rw-r-- 1 ec2-user ec2-user 6371 Oct 15 17:46 hadoop-ec2-user-namenode-hadoop-master02.out.1 -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 16:18 hadoop-ec2-user-namenode-hadoop-master02.out.2 -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 15:57 hadoop-ec2-user-namenode-hadoop-master02.out.3 -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 15:49 hadoop-ec2-user-namenode-hadoop-master02.out.4 -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 13 17:36 hadoop-ec2-user-namenode-hadoop-master02.out.5 -rw-rw-r-- 1 ec2-user ec2-user 11161047 Oct 16 05:24 hadoop-ec2-user-resourcemanager-hadoop-master02.log -rw-rw-r-- 1 ec2-user ec2-user 2218 Oct 15 16:23 hadoop-ec2-user-resourcemanager-hadoop-master02.out -rw-rw-r-- 1 ec2-user ec2-user 2218 Oct 15 16:18 hadoop-ec2-user-resourcemanager-hadoop-master02.out.1 -rw-rw-r-- 1 ec2-user ec2-user 2218 Oct 15 15:34 hadoop-ec2-user-resourcemanager-hadoop-master02.out.2 -rw-rw-r-- 1 ec2-user ec2-user 2218 Oct 15 15:21 hadoop-ec2-user-resourcemanager-hadoop-master02.out.3 -rw-rw-r-- 1 ec2-user ec2-user 2218 Oct 15 15:06 hadoop-ec2-user-resourcemanager-hadoop-master02.out.4 -rw-rw-r-- 1 ec2-user ec2-user 2218 Oct 15 14:23 hadoop-ec2-user-resourcemanager-hadoop-master02.out.5 -rw-rw-r-- 1 ec2-user ec2-user 5751751 Oct 15 18:12 hadoop-ec2-user-zkfc-hadoop-master02.log -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 18:12 hadoop-ec2-user-zkfc-hadoop-master02.out -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 16:23 hadoop-ec2-user-zkfc-hadoop-master02.out.1 -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 16:18 hadoop-ec2-user-zkfc-hadoop-master02.out.2 -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 15:58 hadoop-ec2-user-zkfc-hadoop-master02.out.3 -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 15 15:49 hadoop-ec2-user-zkfc-hadoop-master02.out.4 -rw-rw-r-- 1 ec2-user ec2-user 693 Oct 13 17:36 hadoop-ec2-user-zkfc-hadoop-master02.out.5 -rw-rw-r-- 1 ec2-user ec2-user 0 Sep 17 11:55 SecurityAuth-ec2-user.audit [ec2-user@hadoop-master02 logs]$ pwd /data/module/hadoop-3.3.4/logs 我不记得对master01做过什么了 但他的log就是不写日志了
10-17
<think>Hmm, the user is asking about why log files aren't being written to the hadoop/logs directory on master1 in their self-built Hadoop cluster with 3 masters and 4 cores. This seems like a troubleshooting scenario for Hadoop cluster operations. First, I need to recall the key components involved in Hadoop logging. From the references provided earlier, I see that Hadoop's configuration plays a crucial role, especially files like hadoop-env.sh which sets environment variables including logging paths. Looking at the user's cluster setup - 3 masters likely handling NameNode high availability (active/standby setup based on reference 2 and 5), and 4 core nodes for data/compute. The fact that only master1 has this issue suggests it's node-specific rather than cluster-wide. The most probable causes would be: Permissions issues - Hadoop service account (usually 'hadoop') needs write access to the logs directory. Reference 3 mentions container-executor.cfg which relates to YARN permissions, but that's more for containers than NameNode logging. Configuration errors - The HADOOP_LOG_DIR setting in hadoop-env.sh might point elsewhere on master1. Since reference 1 shows how configurations propagate via scp, there might be local overrides. Service status - If the NameNode isn't running properly on master1 (could be standby node), it might not generate logs. Reference 5 discusses bootstrapping standby nodes which relates to this. I should also consider disk space issues and selinux/apparmor restrictions since those commonly block log writes. The journalnode process mentioned in reference 2 could be relevant if it's part of the HA setup. The troubleshooting steps should start simple - verify permissions and configs locally before checking service status across the cluster. Since it's a multi-master setup, we need to confirm whether master1 is currently active or standby, as standby nodes might have different logging behavior. </think>在自建的3 Master + 4 Core节点的Hadoop集群中,master1的`hadoop/logs`目录无日志写入,需按以下步骤排查: ### 一、权限问题(最常见原因) 1. **目录所有权检查** 确保日志目录属主为Hadoop服务账户(通常为`hadoop`用户): ```bash ls -ld /path/to/hadoop/logs ``` ✅ 正确示例:`drwxr-xr-x 2 hadoop hadoop 4096 ...` 2. **修复权限命令** ```bash chown -R hadoop:hadoop /path/to/hadoop/logs chmod 1775 /path/to/hadoop/logs # 添加粘滞位防止误删 ``` ### 二、配置错误 1. **检查`hadoop-env.sh`配置** 确认日志路径变量`HADOOP_LOG_DIR`指向正确位置: ```bash grep HADOOP_LOG_DIR $HADOOP_HOME/etc/hadoop/hadoop-env.sh ``` ✅ 正确配置应为: `export HADOOP_LOG_DIR=/path/to/hadoop/logs` *注意:参考引用[1]强调此文件控制环境路径[^1]。* 2. **验证配置文件同步** 若集群配置未同步到master1,需重新同步: ```bash # 从配置主节点同步至master1(参考引用[2]方法) scp -r config_host:$HADOOP_HOME/etc/hadoop/* hadoop@master1:$HADOOP_HOME/etc/hadoop/ ``` ### 三、服务状态异常 1. **检查NameNode进程状态** ```bash # 在master1执行 jps | grep -E 'NameNode|JournalNode' ``` ❌ 若无`NameNode`或`JournalNode`进程,说明服务未启动: ```bash # 启动NameNode(HA模式下) hdfs --daemon start namenode # 启动JournalNode(参考引用[2]高可用组件[^2]) hdfs --daemon start journalnode ``` 2. **查看系统日志定位启动失败原因** ```bash grep -i error /var/log/messages | grep hadoop ``` ### 四、磁盘空间问题 1. **检查存储状态** ```bash df -h /path/to/hadoop/logs # 验证磁盘空间 df -i /path/to/hadoop/logs # 验证inode数量 ``` ### 五、SELinux/AppArmor限制 1. **临时禁用测试** ```bash setenforce 0 # CentOS/RHEL systemctl stop apparmor # Ubuntu ``` 若日志恢复写入,需调整安全策略: ```bash audit2allow -a # 生成新策略(根据审计日志) ``` ### 六、日志框架故障 1. **检查Log4j配置** ```bash grep log4j $HADOOP_HOME/etc/hadoop/*.xml ``` 确认`log4j.properties`存在且路径正确(默认在`$HADOOP_HOME/etc/hadoop/`) ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值