问题1. 今天发现HDFS根目录有点奇怪,似乎/spark/就被当做是/根目录:
[hadoop2@hadoop1 tmp]$ hdfs dfs -ls hdfs:/
Found 10 items
drwxr-xr-x - hadoop supergroup 0 2018-06-29 09:41 hdfs:///hbase
drwxr-xr-x - hadoop supergroup 0 2018-08-08 15:29 hdfs:///jwb
drwxr-xr-x - hadoop supergroup 0 2017-08-18 22:15 hdfs:///system
drwx------ - hadoop supergroup 0 2017-11-01 18:41 hdfs:///tmp
drwx------ - spark supergroup 0 2017-05-25 16:57 hdfs:///user
drwxr-xr-x - hadoop supergroup 0 2017-09-18 15:48 hdfs:///usr
[hadoop2@hadoop1 tmp]$ hdfs dfs -ls hdfs://
Found 10 items
drwxr-xr-x - hadoop supergroup 0 2018-06-29 09:41 hdfs:///hbase
drwxr-xr-x - hadoop supergroup 0 2018-08-08 15:29 hdfs:///jwb
drwxr-xr-x - hadoop supergroup 0 2017-08-18 22:15 hdfs:///system
drwx------ - hadoop supergroup 0 2017-11-01 18:41 hdfs:///tmp
drwx------ - spark supergroup 0 2017-05-25 16:57 hdfs:///user
drwxr-xr-x - hadoop supergroup 0 2017-09-18 15:48 hdfs:///usr
[hadoop@hadoop1 tmp]$ hdfs dfs -ls hdfs://spark
Found 1 items
drwx------ - spark supergroup 0 2018-08-08 08:00 hdfs://spark/user/hadoop/.Trash
[hadoop2@hadoop1 tmp]$ hdfs dfs -ls /
Found 10 items
drwxr-xr-x - hadoop supergroup 0 2017-02-09 16:58 /data2
drwxr-xr-x - hadoop supergroup 0 2018-06-29 09:41 /hbase
drwxr-xr-x - hadoop supergroup 0 2018-08-08 15:29 /jwb
drwxr-xr-x - hadoop supergroup 0 2017-08-18 22:15 /system
drwx------ - hadoopsupergroup 0 2017-11-01 18:41 /tmp
drwx------ - spark supergroup 0 2017-05-25 16:57 /user
drwxr-xr-x - hadoopsupergroup 0 2017-09-18 15:48 /usr
[hadoop2@hadoop1 tmp]$ hdfs dfs -ls hdfs://spark/hbase
Found 8 items
drwxr-xr-x - hbase supergroup 0 2018-06-29 09:41 hdfs://spark/hbase/.tmp
drwxr-xr-x - hbase supergroup 0 2018-06-29 09:42 hdfs://spark/hbase/WALs
drwxr-xr-x - hbase supergroup 0 2018-06-29 20:48 hdfs://spark/hbase/archive
drwxr-xr-x - hbase supergroup 0 2017-08-19 10:33 hdfs://spark/hbase/corrupt
drwxr-xr-x - hbase supergroup 0 2017-01-07 18:42 hdfs://spark/hbase/data
-rw-r--r-- 1 hbase supergroup 42 2017-01-07 18:40 hdfs://spark/hbase/hbase.id
-rw-r--r-- 1 hbase supergroup 7 2017-01-07 18:40 hdfs://spark/hbase/hbase.version
drwxr-xr-x - hbase supergroup 0 2018-07-07 11:42 hdfs://spark/hbase/oldWALs
用其他方式都看不到spark目录(似乎/spark/就是根目录)
查看了下配置文件发现下面设置跟官网长的不一样:
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://spark</value>
</property>
hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://spark/hbase</value>
</property>
我试图改成官网的hdfs://master:9000形式,发现改了后重启失败;
而且通过spark访问hbase时conf.set("hbase.rootdir", Conf.hbaseRootdir)用以下路径都能访问(而且那个端口9110似乎可以随便改!)
hbase.rootdir=hdfs://spark
hbase.rootdir1=hdfs://spark/hbase
hbase.rootdir2=hdfs://hadoop1:9110
hbase.rootdir3=hdfs://hadoop1:9110/hbase
之前没见过,暂时记录下。
问题2. Scala+maven的MapReduce程序无法像 http://hadoop.apache.org/docs/r1.0.4/cn/mapred_tutorial.html 那样指定入口
官网运行例子 :$ bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input /usr/joe/wordcount/output
而我的:
[hadoop2@hadoop1 tmp]$ hadoop jar test-1.0-SNAPSHOT-jar-with-dependencies.jar test.WordCount /jwb/in /jwb/out
18/08/08 15:36:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoop230 (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://spark/jwb/in already exists
18/08/08 15:36:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoop230 (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://spark/jwb/in already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://spark/jwb/in already exists
看样子是参数位置问题,改成下面格式就成功了(但是手改了MANIFEST.MF文件入口为Main-Class: test.WordCount)
[hadoop2@hadoop1 tmp]$ hadoop jar test-1.0-SNAPSHOT-jar-with-dependencies.jar /jwb/in/ /jwb/out/1