那些年我们踩过的Hive坑

本文详细解析Hive在实际应用中遇到的多种问题,包括缺少MySQL驱动包、元数据库初始化、相对路径在绝对URI中、Hive表创建失败等,并提供具体的解决策略。

原文地址:https://blog.youkuaiyun.com/sunnyyoona/article/details/51648871 

1. 缺少MySQL驱动包

1.1 问题描述

Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
	at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58)
	at org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:213)

1.2. 解决方案

上述问题很可能是缺少mysql的jar包,下载 mysql-connector-java-5.1.32.tar.gz,复制到hive的lib目录下:

xiaosi@yoona:~$ cp mysql-connector-java-5.1.34-bin.jar opt/hive-2.1.0/lib/

2. 元数据库mysql初始化

2.1 问题描述

运行./hive脚本时,无法进入,报错:

Exception in thread "main" java.lang.RuntimeException: Hive metastore database is not initialized. Please use 
schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include 
the option to auto-create the underlying database in your JDBC connection string (
e.g. ?createDatabaseIfNotExist=true for mysql)

 2.2 解决方案

在scripts目录下运行 schematool -initSchema -dbType mysql 命令进行Hive元数据库的初始化:

xiaosi@yoona:~/opt/hive-2.1.0/scripts$  schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/xiaosi/opt/hive-2.1.0/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/xiaosi/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:     jdbc:mysql://localhost:3306/hive_meta?createDatabaseIfNotExist=true
Metastore Connection Driver :     com.mysql.jdbc.Driver
Metastore connection User:     root
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.mysql.sql
Initialization script completed
schemaTool completed

3. Relative path in absolute URI

 3.1 问题描述

Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
...
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at java.net.URI.checkPath(URI.java:1823)
    at java.net.URI.<init>(URI.java:745)
    at org.apache.hadoop.fs.Path.initialize(Path.java:202)
    ... 12 more

 3.2 解决方案

产生上述问题的原因是使用了没有配置的变量,解决此问题只需在配置文件hive-site.xml中配置system:user.name system:java.io.tmpdir两个变量,配置文件中就可以使用这两个变量:

<property>
    <name>system:user.name</name>
    <value>xiaosi</value>
</property>
<property>
    <name>system:java.io.tmpdir</name>
    <value>/home/${system:user.name}/tmp/hive/</value>
</property>

4. 拒绝连接

4.1 问题描述 

on exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
...
Caused by: java.net.ConnectException: Call From Qunar/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
...
Caused by: java.net.ConnectException: 拒绝连接
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 29 more

 4.2 解决方案

有可能是Hadoop没有启动,使用jps查看一下当前进程发现:

xiaosi@yoona:~/opt/hive-2.1.0$ jps
7317 Jps

可以看见,我们确实没有启动Hadoop。开启Hadoop的NameNode和DataNode守护进程

xiaosi@yoona:~/opt/hadoop-2.7.3$ ./sbin/start-dfs.sh 
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/xiaosi/opt/hadoop-2.7.3/logs/hadoop-xiaosi-namenode-yoona.out
localhost: starting datanode, logging to /home/xiaosi/opt/hadoop-2.7.3/logs/hadoop-xiaosi-datanode-yoona.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/xiaosi/opt/hadoop-2.7.3/logs/hadoop-xiaosi-secondarynamenode-yoona.out
xiaosi@yoona:~/opt/hadoop-2.7.3$ jps
8055 Jps
7561 NameNode
7929 SecondaryNameNode
7724 DataNode

 5. 创建Hive表失败

 5.1 问题描述

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)

 5.2 解决方案

查看Hive日志,看到这样的错误日志:

NestedThrowablesStackTrace:
Could not create "increment"/"table" value-generation container `SEQUENCE_TABLE` since autoCreate flags do not allow it. 
org.datanucleus.exceptions.NucleusUserException: Could not create "increment"/"table" value-generation container `SEQUENCE_TABLE` since autoCreate flags do not allow it.

出现上述问题主要因为mysql的bin-log format默认为statement ,在mysql中通过 show variables like 'binlog_format'; 语句查看bin-log format的配置值

mysql> show variables like 'binlog_format';
+---------------+-----------+
| Variable_name | Value     |
+---------------+-----------+
| binlog_format | STATEMENT |
+---------------+-----------+
1 row in set (0.00 sec)

修改bin-log format的默认值,在mysql的配置文件 /etc/mysql/mysql.conf.d/mysqld.cnf 中添加 binlog_format="MIXED" ,重启mysql,再启动 hive即可。

mysql> show variables like 'binlog_format';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | MIXED |
+---------------+-------+
1 row in set (0.00 sec)

再次执行创表语句:

hive> create table  if not exists employees(
    >    name string comment '姓名',
    >    salary float comment '工资',
    >    subordinates array<string> comment '下属',
    >    deductions map<string,float> comment '扣除金额',
    >    address struct<city:string,province:string> comment '家庭住址'
    > )
    > comment '员工信息表'
    > ROW FORMAT DELIMITED 
    > FIELDS TERMINATED BY '\t'
    > LINES TERMINATED BY  '\n'
    > STORED AS TEXTFILE;
OK
Time taken: 0.664 seconds

6. 加载数据失败

6.1 问题描述 

hive> load data local inpath '/home/xiaosi/hive/input/result.txt' overwrite into table recent_attention;
Loading data to table test_db.recent_attention
Failed with exception Unable to move source file:/home/xiaosi/hive/input/result.txt to destination hdfs://localhost:9000/user/hive/warehouse/test_db.db/recent_attention/result.txt
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

查看Hive日志,看到这样的错误日志:

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /home/xiaosi/hive/warehouse/recent_attention/result.txt could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

看到 0 datanodes running 我们猜想可能datanode挂掉了,jps验证一下,果然我们的datanode没有启动起来。

6.2 问题解决

这个问题是由于datanode没有启动导致的,至于datanode为什么没有启动起来,去看另一篇博文:那些年踩过的Hadoop坑(http://blog.youkuaiyun.com/sunnyyoona/article/details/51659080

 7. Java连接Hive 驱动失败

 7.1 问题描述

java.lang.ClassNotFoundException: org.apache.hadoop.hive.jdbc.HiveDriver
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[na:1.8.0_91]
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.8.0_91]
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[na:1.8.0_91]
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.8.0_91]
    at java.lang.Class.forName0(Native Method) ~[na:1.8.0_91]
    at java.lang.Class.forName(Class.java:264) ~[na:1.8.0_91]
    at com.sjf.open.hive.HiveClient.getConn(HiveClient.java:29) [classes/:na]
    at com.sjf.open.hive.HiveClient.run(HiveClient.java:53) [classes/:na]
    at com.sjf.open.hive.HiveClient.main(HiveClient.java:77) [classes/:na]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_91]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_91]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_91]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_91]
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144) [idea_rt.jar:na]

 7.2 解决方案

private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";

取代

private static String driverName = "org.apache.hive.jdbc.HiveDriver"

 8. create table问题

 8.1 问题描述

create table if not exists employee(
   name string comment 'employee name',
   salary float comment 'employee salary',
   subordinates array<string> comment 'names of subordinates',
   deductions map<string,float> comment 'keys are deductions values are percentages',
   address struct<street:string, city:string, state:string, zip:int> comment 'home address'
)
comment 'description of the table'
tblproperties ('creator'='yoona','date'='20160719')
location '/user/hive/warehouse/test.db/employee';
错误信息:

FAILED: ParseException line 10:0 missing EOF at 'location' near ')'

 8.2 解决方案

Location放在TBPROPERTIES之前:

create table if not exists employee(
   name string comment 'employee name',
   salary float comment 'employee salary',
   subordinates array<string> comment 'names of subordinates',
   deductions map<string,float> comment 'keys are deductions values are percentages',
   address struct<street:string, city:string, state:string, zip:int> comment 'home address'
)
comment 'description of the table'
location '/user/hive/warehouse/test.db/employee'
tblproperties ('creator'='yoona','date'='20160719');
create table命令:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable

9. JDBC Hive 拒绝连接

 9.1 问题描述

15:00:50.815 [main] INFO  org.apache.hive.jdbc.Utils - Supplied authorities: localhost:10000
15:00:50.832 [main] INFO  org.apache.hive.jdbc.Utils - Resolved authority: localhost:10000
15:00:51.010 [main] DEBUG o.a.thrift.transport.TSaslTransport - opening transport org.apache.thrift.transport.TSaslClientTransport@3ffc5af1
15:00:51.019 [main] WARN  org.apache.hive.jdbc.HiveConnection - Failed to connect to localhost:10000
15:00:51.027 [main] ERROR com.sjf.open.hive.HiveClient - Connection error!
java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000/default: java.net.ConnectException: 拒绝连接
    at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:219) ~[hive-jdbc-2.1.0.jar:2.1.0]
    at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:157) ~[hive-jdbc-2.1.0.jar:2.1.0]
    at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107) ~[hive-jdbc-2.1.0.jar:2.1.0]
    at java.sql.DriverManager.getConnection(DriverManager.java:664) ~[na:1.8.0_91]
    at java.sql.DriverManager.getConnection(DriverManager.java:247) ~[na:1.8.0_91]
    at com.sjf.open.hive.HiveClient.getConn(HiveClient.java:29) [classes/:na]
    at com.sjf.open.hive.HiveClient.run(HiveClient.java:52) [classes/:na]
    at com.sjf.open.hive.HiveClient.main(HiveClient.java:76) [classes/:na]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_91]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_91]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_91]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_91]
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144) [idea_rt.jar:na]
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: 拒绝连接
    at org.apache.thrift.transport.TSocket.open(TSocket.java:226) ~[libthrift-0.9.3.jar:0.9.3]
    at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266) ~[libthrift-0.9.3.jar:0.9.3]
    at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) ~[libthrift-0.9.3.jar:0.9.3]
    at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:195) ~[hive-jdbc-2.1.0.jar:2.1.0]
    ... 12 common frames omitted
Caused by: java.net.ConnectException: 拒绝连接
    at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0_91]
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[na:1.8.0_91]
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[na:1.8.0_91]
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[na:1.8.0_91]
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.8.0_91]
    at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_91]
    at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ~[libthrift-0.9.3.jar:0.9.3]
    ... 15 common frames omitted

9.2 解决方案

(1) 检查hive server2是否启动:

xiaosi@Qunar:/opt/apache-hive-2.0.0-bin/bin$ sudo netstat -anp | grep 10000

如果没有启动hive server2,首先启动服务:

xiaosi@Qunar:/opt/apache-hive-2.0.0-bin/conf$ hive --service hiveserver2 >/dev/null 2>/dev/null &
[1] 11978

(2) 检查配置:

<property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
</property>

10. User root is not allowed to impersonate anonymous

10.1 问题描述 

Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User:xiaosiis not allowed to impersonate anonymous

10.2 解决方案 

修改hadoop 配置文件 etc/hadoop/core-site.xml ,加入如下配置项

<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>

备注:hadoop.proxyuser.XXX.hosts 与 hadoop.proxyuser.XXX.groups 中XXX为异常信息中User:* 中的用户名部分

<property> 
    <name>hadoop.proxyuser.xiaosi.hosts</name> 
    <value>*</value> 
    <description>The superuser can connect only from host1 and host2 to impersonate a user</description>
</property> 
<property> 
    <name>hadoop.proxyuser.xiaosi.groups</name> 
    <value>*</value> 
    <description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
</property>

11. 安全模式

11.1 问题描述 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/xiaosi/c2f6130d-3207-4360-8734-dba0462bd76c. Name node is in safe mode.
The reported blocks 22 has reached the threshold 0.9990 of total blocks 22. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 5 seconds.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1327)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3893)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:983)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy32.mkdirs(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:558)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy33.mkdirs(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3000)
    at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2970)
    at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1047)
    at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1043)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1043)
    at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1036)
    at org.apache.hadoop.hive.ql.session.SessionState.createPath(SessionState.java:682)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:617)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:526)
    ... 9 more

11.2 问题分析

hdfs在启动开始时会进入安全模式,这时文件系统中的内容不允许修改也不允许删除,直到安全模式结束。安全模式主要是为了系统启动的时候检查各个DataNode上数据块的有效性,同时根据策略必要的复制或者删除部分数据块。运行期通过命令也可以进入安全模式。在实践过程中,系统启动的时候去修改和删除文件也会有安全模式不允许修改的出错提示,只需要等待一会儿即可。

11.3 问题解决

可以等待其自动退出安全模式,也可以使用手动命令来离开安全模式:

xiaosi@yoona:~$ hdfs dfsadmin -safemode leave
Safe mode is OFF

 

<think>我们正在讨论搭建Linux+Hadoop生态大数据处理平台时的注意事项和经验。根据引用内容,我们可以总结以下几点:1.**环境准备阶段注意事项**:-使用全新的系统环境(如CentOS7),避免已有Hadoop环境导致冲突(引用[2])。-确保所有节点使用相同的操作系统版本和配置。-正确设置JAVA_HOME环境变量,并确保所有节点一致(引用[4])。注意:某些组件(如ClouderaManager)可能需要显式在配置文件中指定JAVA_HOME(引用[4]中提到的`/etc/default/cloudera-scm-server`)。2.**安装包管理**:-提前下载好所需安装包(如HDP、Ambari或CDH),并确保版本兼容性(引用[2])。-注意:在安装过程中,避免使用可能干扰安装的外部工具(如ntfs-3g),安装完成后再启用(引用[2])。3.**集群节点管理**:-建议先用少量节点(如3台)搭建实验集群,验证成功后再扩展节点(引用[2])。-节点之间需要配置SSH免密登录,确保通信畅通。4.**用户和权限管理**:-Hadoop生态组件(如HDP)会创建多个专用用户(如hdfs、yarn、hive等),安装前需确保这些用户不存在或提前清理(引用[2])。-权限问题HDFS目录权限、Linux文件系统权限需协调一致。5.**安装过程中的常见问题**:-**JAVA_HOME问题**:即使系统环境变量已设置,某些服务(如ClouderaManager)仍可能找不到,需在服务配置文件中显式指定(引用[4])。-**端口冲突**:Hadoop组件使用大量端口,安装前需检查端口占用情况(如8080、50070等)。-**主机名解析**:确保所有节点的主机名(hostname)能正确解析(通过/etc/hosts或DNS),且避免使用localhost。6.**安装后的测试与监控**:-启动服务后,通过WebUI(如NameNode的50070端口)和日志(/var/log/)验证服务状态。-使用监控工具(如Ambari或ClouderaManager)进行集群健康检查。###经验总结:1.**重装集群的复杂性**:引用[4]指出,重装集群可能比初次安装更复杂,因为残留的配置文件、数据或用户账户可能导致问题。彻底清理的方法包括:-卸载所有相关软件包。-删除残留文件和目录(如Hadoop的临时目录、日志目录、HDFS元数据目录等)。-删除相关用户和用户组。2.**小文件问题**:HDFS不适合存储大量小文件(引用[3]的面试题可能涉及),解决方案:-使用HAR(HadoopArchive)或SequenceFile合并小文件。-调整HDFS的blocksize和NameNode内存配置。3.**资源分配**:YARN资源管理需根据集群硬件合理配置,避免内存溢出或CPU争抢。###关键命令示例:```bash#检查Java环境echo$JAVA_HOME#彻底清理Hadoop残留(重装前)rm-rf/etc/hadoop*/tmp/hadoop*/var/log/hadoop*/hdfs/data/*/hdfs/name/*userdel-rhdfsyarnmapredhive#删除相关用户#格式化NameNode(初次启动前)hdfsnamenode-format```§§相关问题§§1.如何彻底清理Hadoop集群以便重装?2.Hadoop集群中如何优化小文件存储?3.集群节点间SSH免密登录如何配置?4.如何解决Hadoop服务启动时的端口冲突问题?5.如何监控Hadoop集群的健康状态?</think>### 搭建Linux+Hadoop生态平台的注意事项与经验 #### 一、环境准备阶段 1. **操作系统选择** - 使用纯净的CentOS 7系统(避免残留配置冲突)[^2] - 所有节点必须保持**相同版本**的操作系统 - 禁用防火墙和SELinux(否则会导致节点通信失败) 2. **Java环境配置** - 统一使用JDK 8(兼容性最佳) - 需在所有节点显式设置`JAVA_HOME` ```bash # 必须同时在/etc/profile和组件配置文件中声明 export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_171 ``` - 典型:Cloudera Manager启动报错"JAVA_HOME not found",需在`/etc/default/cloudera-scm-server`添加环境变量[^4] #### 二、集群搭建核心环节 1. **节点规划陷阱** - 避免在已有Hadoop环境的机器上安装(残留用户/配置会导致冲突)[^2] - 先使用**3台测试机验证**成功后再扩展节点 - 主机名必须可解析:`/etc/hosts`需包含所有节点IP-hostname映射 2. **存储设备处理** - 外部存储设备(如NTFS格式移动硬盘)需安装`ntfs-3g` ```bash yum install ntfs-3g # 传输安装包后必须卸载,否则引发HDP安装警告[^2] ``` - 安装完成后立即卸载第三方存储驱动 3. **权限管理雷区** - Hadoop会创建`hdfs`、`yarn`等系统用户,提前检查是否已存在 ```bash getent passwd hadoop # 存在则删除 userdel -r hadoop ``` - 目录权限错误导致NameNode启动失败: `chown -R hdfs:hdfs /var/log/hadoop-hdfs` #### 三、组件安装特殊问题 1. **HDFS格式化风险** - 仅首次启动时格式化NameNode:`hdfs namenode -format` - **严禁重复格式化**:会导致ClusterID变化,DataNode无法加入 2. **端口冲突解决方案** - 常见冲突端口:8080(Ambari), 50070(HDFS), 2181(Zookeeper) - 检测命令:`netstat -tulnp | grep <端口号>` - 修改默认端口:在`hdfs-site.xml`中添加 ```xml <property> <name>dfs.namenode.http-address</name> <value>0.0.0.0:50071</value> <!-- 修改默认50070 --> </property> ``` 3. **配置文件同步要求** - 所有节点的核心配置必须完全一致: - `core-site.xml`中的`fs.defaultFS` - `yarn-site.xml`中的`yarn.resourcemanager.hostname` - 使用`rsync`同步配置: ```bash rsync -av /etc/hadoop/conf/ worker1:/etc/hadoop/conf/ ``` #### 四、典型故障处理 1. **DataNode无法注册** - 现象:Live nodes显示为0 - 排查步骤: 1. 检查`/etc/hosts`主机名解析 2. 确认所有节点时间同步(安装`ntpdate`) 3. 清理DataNode的`version`文件: `rm -rf /data/hadoop/dfs/data/current/VERSION` 2. **资源管理器启动失败** - 查看日志定位原因:`tail -f /var/log/hadoop-yarn/yarn-yarn-resourcemanager-*.log` - 常见错误: `Unable to load realm info from SCDynamicStore` 解决方案:在`yarn-env.sh`添加`export HADOOP_SECURE_DN_USER=yarn` #### 五、生产环境特别建议 1. **磁盘配置原则** - 数据目录使用**单独磁盘**(非系统盘) - 配置多目录提升吞吐: ```xml <!-- hdfs-site.xml --> <property> <name>dfs.datanode.data.dir</name> <value>/data1/hdfs,/data2/hdfs,/data3/hdfs</value> </property> ``` 2. **安全加固要点** - Kerberos认证:防止未授权访问 - HDFS透明加密:保护敏感数据 - 启用审计日志:`hadoop.security.audit.logger` > **关键经验**:搭建失败时优先检查`/var/log/`下的组件日志,70%的问题可通过日志定位[^3]。每次变更后使用`hadoop dfsadmin -report`验证集群状态。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值