01.添加hadoop用户组到系统用户 安装前要做一件事
添加一个名为hadoop的用户到系统用户,专门用来做hadoop测试
(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs# sudo addgroup hadoop
Adding group `hadoop' (GID 1000) ...
Done.
(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs# sudo adduser --ingroup hadoop hadoop
Adding user `hadoop' ...
Adding new user `hadoop' (1000) with group `hadoop' ...
Creating home directory `/home/hadoop' ...
Copying files from `/etc/skel' ...
New password:LYP809834049
Retype new password:LYP809834049
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y
02.给hadoop用户系统权限
打开/etc/sudoers并修改,插入语句如下:
hadoop ALL=(ALL:ALL) ALL
03.java安装,之前已装有java
(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs# java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~20.04-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
04.解压安装hadoop
root@Windows-2021WEO:/mnt/e/win_ubuntu/envs# tar zxvf hadoop-2.7.7.tar.gz
05.修改下面目录的文件
root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# pwd
/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop
root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# ls
capacity-scheduler.xml hadoop-metrics.properties httpfs-signature.secret log4j.properties ssl-client.xml.example
configuration.xsl hadoop-metrics2.properties httpfs-site.xml mapred-env.cmd ssl-server.xml.example
container-executor.cfg hadoop-policy.xml kms-acls.xml mapred-env.sh yarn-env.cmd
core-site.xml hdfs-site.xml kms-env.sh mapred-queues.xml.template yarn-env.sh
hadoop-env.cmd httpfs-env.sh kms-log4j.properties mapred-site.xml.template yarn-site.xml
hadoop-env.sh httpfs-log4j.properties kms-site.xml slaves
root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# vim hadoop-env.sh
添加内容并source生效:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
06.修改总环境变量
root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# vim /etc/profile
添加内容并source生效
#hadoop
export HADOOP_HOME=/mnt/e/win_ubuntu/envs/hadoop-2.7.7
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH
07.将/etc/profile中新增的部分在root下的环境变量中再导入一次,并source生效
root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# vim /root/.bashrc
root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# source /root/.bashrc
08.查看hadoop是否可用
(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# whereis hadoop
hadoop: /mnt/e/win_ubuntu/envs/hadoop-2.7.7/bin/hadoop /mnt/e/win_ubuntu/envs/hadoop-2.7.7/bin/hadoop.cmd
(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# hadoop version
Hadoop 2.7.7
Subversion Unknown -r c1aad84bd27cd79c3d1a7dd58202a8c3ee1ed3ac
Compiled by stevel on 2018-07-18T22:47Z
Compiled with protoc 2.5.0
From source with checksum 792e15d20b12c74bd6f19a1fb886490
This command was run using /mnt/e/win_ubuntu/envs/hadoop-2.7.7/share/hadoop/common/hadoop-common-2.7.7.jar
09.启动hadoop
(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/sbin# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Error: Cannot find configuration directory: /etc/hadoop
starting yarn daemons
Error: Cannot find configuration directory: /etc/hadoop
需要启动顺序
(先后顺序启动)
[root@master sbin]# ./start-dfs.sh
[root@master sbin]# ./start-yarn.sh
10.spark-shell启动,Hadoop警告
(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# spark-shell
22/01/12 23:18:03 WARN Utils: Your hostname, Windows-2021WEO resolves to a loopback address: 127.0.1.1; using 192.168.1.4 instead (on interfa
ce eth0)
22/01/12 23:18:03 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/01/12 23:18:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicabl
e
11.配置环境变量,source激活
/etc/profile添加,并同步到/root下的环境变量
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/
SPARK_HOME/conf/spark-env.sh添加以下内容,并激活生效
export LD_LIBRARY_PATH=$JAVA_LIBRARY_PATH
12.最后查看spark-shell中的hadoop警告是否存在
(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/spark-2.4.5-bin-hadoop2.7/conf# spark-shell
22/01/12 23:35:13 WARN Utils: Your hostname, Windows-2021WEO resolves to a loopback address: 127.0.1.1; using 192.168.1.4 instead (on interface eth0)
22/01/12 23:35:13 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.1.4:4040
Spark context available as 'sc' (master = local[*], app id = local-1642001741543).
Spark session available as 'spark'.
13.hadoop免密验证登录
step1:查看是否已有公钥和私钥,这已经有,没有的话需要生成
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# cd /root/.ssh/
(base) root@LAPTOP-P1LA53KS:~/.ssh# ls
id_rsa id_rsa.pub known_hosts
step2:新建授权列表文件
(base) root@LAPTOP-P1LA53KS:~/.ssh# ls
id_rsa id_rsa.pub known_hosts
(base) root@LAPTOP-P1LA53KS:~/.ssh# touch authorized_keys
(base) root@LAPTOP-P1LA53KS:~/.ssh# ls
authorized_keys id_rsa id_rsa.pub known_hosts
step3:更改授权列表文件的权限
(base) root@LAPTOP-P1LA53KS:~/.ssh# chmod 600 authorized_keys
(base) root@LAPTOP-P1LA53KS:~/.ssh# ls
authorized_keys id_rsa id_rsa.pub known_hosts
(base) root@LAPTOP-P1LA53KS:~/.ssh# ls -al
total 8
drwx------ 1 root root 4096 Jan 21 17:04 .
drwx------ 1 root root 4096 Jan 21 16:25 ..
-rw------- 1 root root 0 Jan 21 17:04 authorized_keys
-rw------- 1 root root 2610 Jan 17 06:53 id_rsa
-rw-r--r-- 1 root root 575 Jan 17 06:53 id_rsa.pub
-rw-r--r-- 1 root root 888 Jan 19 00:27 known_hosts
step4:将公钥中的值加入到授权列表,显然这个密钥对是当初布置GitHub时生成的
(base) root@LAPTOP-P1LA53KS:~/.ssh# cat id_rsa.pub >> authorized_keys
(base) root@LAPTOP-P1LA53KS:~/.ssh# cat authorized_keys
ssh-rsa (这个就不给大家看了)
step5:重新启动Hadoop验证是否还是需要密码
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
localhost: starting namenode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-namenode-LAPTOP-P1LA53KS.out
localhost: starting datanode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-datanode-LAPTOP-P1LA53KS.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out
starting yarn daemons
starting resourcemanager, logging to /mnt/e/hadoop-2.7.7/logs/yarn-root-resourcemanager-LAPTOP-P1LA53KS.out
localhost: starting nodemanager, logging to /mnt/e/hadoop-2.7.7/logs/yarn-root-nodemanager-LAPTOP-P1LA53KS.out
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# jps -l
1314 kafka.Kafka
2952 org.apache.flink.runtime.taskexecutor.TaskManagerRunner
19689 sun.tools.jps.Jps
19100 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
477 org.apache.zookeeper.server.quorum.QuorumPeerMain
2414 org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint
19487 org.apache.hadoop.yarn.server.nodemanager.NodeManager
14.Hadoop的web界面配置情况
step1:配置hdfs-site.xml文件
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
step2:配置core-site.xml文件
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.1.5:9000</value>
</property>
</configuration>
step3:格式化文件结构
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/etc/hadoop# hdfs namenode -format
22/01/22 04:18:34 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = LAPTOP-P1LA53KS.localdomain/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.7.7
step4:启动hadoop查看界面
界面地址:http://192.168.1.5:8088/cluster
结果:成功
15.Hadoop创建hdfs文件夹与上传文件
参考:https://blog.youkuaiyun.com/weixin_36250487/article/details/80634005
创建HDFS目录如下
# 显示根目录 / 下的文件和子目录,绝对路径
hadoop fs -ls /
# 新建文件夹,绝对路径
hadoop fs -mkdir /Hadoopfiles
上传文件到HDFS
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# pwd
/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# ls kmean*
kmeans_data.txt
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# hadoop fs -put kmeans_data.txt /Hadoopfiles/
结果:
put: File /Hadoopfiles/kmeans_data.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
重新启动并查看:
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# ./start-dfs.sh
Starting namenodes on [host.docker.internal]
host.docker.internal: starting namenode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-namenode-LAPTOP-P1LA53KS.out
localhost: starting datanode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-datanode-LAPTOP-P1LA53KS.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# jps
9713 SecondaryNameNode
9171 NameNode
9930 Jps
结果:jps发现DataNode进程没有启动
查看一下datanode日志
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7# cd logs/
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/logs# ls
SecurityAuth-root.audit hadoop-root-namenode-LAPTOP-P1LA53KS.out.4 yarn-root-nodemanager-LAPTOP-P1LA53KS.out.2
hadoop-root-datanode-LAPTOP-P1LA53KS.log hadoop-root-namenode-LAPTOP-P1LA53KS.out.5 yarn-root-nodemanager-LAPTOP-P1LA53KS.out.3
hadoop-root-datanode-LAPTOP-P1LA53KS.out hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.log yarn-root-nodemanager-LAPTOP-P1LA53KS.out.4
hadoop-root-datanode-LAPTOP-P1LA53KS.out.1 hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out yarn-root-nodemanager-LAPTOP-P1LA53KS.out.5
hadoop-root-datanode-LAPTOP-P1LA53KS.out.2 hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out.1 yarn-root-resourcemanager-LAPTOP-P1LA53KS.log
hadoop-root-datanode-LAPTOP-P1LA53KS.out.3 hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out.2 yarn-root-resourcemanager-LAPTOP-P1LA53KS.out
hadoop-root-datanode-LAPTOP-P1LA53KS.out.4 hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out.3 yarn-root-resourcemanager-LAPTOP-P1LA53KS.out.1
hadoop-root-datanode-LAPTOP-P1LA53KS.out.5 hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out.4 yarn-root-resourcemanager-LAPTOP-P1LA53KS.out.2
hadoop-root-namenode-LAPTOP-P1LA53KS.log hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out.5 yarn-root-resourcemanager-LAPTOP-P1LA53KS.out.3
hadoop-root-namenode-LAPTOP-P1LA53KS.out userlogs yarn-root-resourcemanager-LAPTOP-P1LA53KS.out.4
hadoop-root-namenode-LAPTOP-P1LA53KS.out.1 yarn-root-nodemanager-LAPTOP-P1LA53KS.log yarn-root-resourcemanager-LAPTOP-P1LA53KS.out.5
hadoop-root-namenode-LAPTOP-P1LA53KS.out.2 yarn-root-nodemanager-LAPTOP-P1LA53KS.out
hadoop-root-namenode-LAPTOP-P1LA53KS.out.3 yarn-root-nodemanager-LAPTOP-P1LA53KS.out.1
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/logs# ls *datanode* -al
-rwxrwxrwx 1 root root 215306 Jan 22 05:44 hadoop-root-datanode-LAPTOP-P1LA53KS.log
-rwxrwxrwx 1 root root 715 Jan 22 05:44 hadoop-root-datanode-LAPTOP-P1LA53KS.out
-rwxrwxrwx 1 root root 715 Jan 22 04:33 hadoop-root-datanode-LAPTOP-P1LA53KS.out.1
-rwxrwxrwx 1 root root 715 Jan 22 04:26 hadoop-root-datanode-LAPTOP-P1LA53KS.out.2
-rwxrwxrwx 1 root root 715 Jan 22 04:21 hadoop-root-datanode-LAPTOP-P1LA53KS.out.3
-rwxrwxrwx 1 root root 715 Jan 22 03:57 hadoop-root-datanode-LAPTOP-P1LA53KS.out.4
-rwxrwxrwx 1 root root 715 Jan 22 03:52 hadoop-root-datanode-LAPTOP-P1LA53KS.out.5
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/logs# cat hadoop-root-datanode-LAPTOP-P1LA53KS.log
里面报错的原因namenode clusterID,datanode clusterID不一致原因:在于对HDFS重新做了格式化,导致版本不一致
2022-01-22 05:44:39,356 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/tmp/hadoop-root/dfs/data/
java.io.IOException: Incompatible clusterIDs in /tmp/hadoop-root/dfs/data: namenode clusterID = CID-709b35c3-5162-4658-85a8-6859b5593bb8; datanode clusterID = CID-8d8f8dcf-8cb1-4e7c-9010-c7d5eab887b0
这里发现一个奇怪的问题,我的Version是放置在了根目录的tmp下面,在Hadoop中没有tmp文件夹
原因可能来至于前面的xml配置不对
(base) root@LAPTOP-P1LA53KS:/tmp# pwd
/tmp
(base) root@LAPTOP-P1LA53KS:/tmp# ls *hadoop* -al
-rw-r--r-- 1 root root 5 Jan 22 05:44 hadoop-root-datanode.pid
-rw-r--r-- 1 root root 5 Jan 22 05:44 hadoop-root-namenode.pid
-rw-r--r-- 1 root root 5 Jan 22 05:44 hadoop-root-secondarynamenode.pid
hadoop-root:
total 0
drwxr-xr-x 1 root root 4096 Jan 19 14:36 .
drwxrwxrwt 1 root root 4096 Jan 22 05:44 ..
drwxr-xr-x 1 root root 4096 Jan 22 04:21 dfs
drwxr-xr-x 1 root root 4096 Jan 22 05:42 nm-local-dir
(base) root@LAPTOP-P1LA53KS:/tmp# cd hadoop-root/dfs/
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs# ls
data name namesecondary
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs# cd data/current/
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/data/current# ls
BP-1116483734-127.0.1.1-1642796315158 VERSION
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/data/current# cat VERSION
#Sat Jan 22 04:26:11 CST 2022
storageID=DS-1ad2fdba-a088-44ef-8e47-bc83e24a305e
clusterID=CID-8d8f8dcf-8cb1-4e7c-9010-c7d5eab887b0
cTime=0
datanodeUuid=ed0c62cd-f60c-4546-badf-8c646c57a1df
storageType=DATA_NODE
layoutVersion=-56
修改dfs/data/current/VERSION文件中,将datanode的clusterID的值,改为与namenode的clusterID的值
先看一下namenode
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/name/current# pwd
/tmp/hadoop-root/dfs/name/current
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/name/current# ls V*
VERSION
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/name/current# cat VERSION
#Sat Jan 22 05:44:30 CST 2022
namespaceID=1777974116
clusterID=CID-709b35c3-5162-4658-85a8-6859b5593bb8
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1041636110-127.0.1.1-1642797147817
layoutVersion=-63
再看一下datanode
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/data/current# cat VERSION
#Sat Jan 22 04:26:11 CST 2022
storageID=DS-1ad2fdba-a088-44ef-8e47-bc83e24a305e
clusterID=CID-8d8f8dcf-8cb1-4e7c-9010-c7d5eab887b0
cTime=0
datanodeUuid=ed0c62cd-f60c-4546-badf-8c646c57a1df
storageType=DATA_NODE
layoutVersion=-56
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/data/current# pwd
/tmp/hadoop-root/dfs/data/current
关闭进程重启一下dfs,DataNode已经启动了
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# jps
9713 SecondaryNameNode
9171 NameNode
21322 Jps
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# kill -9 9713 9171
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# jps
21451 Jps
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# ls
distribute-exclude.sh hdfs-config.sh refresh-namenodes.sh start-balancer.sh start-yarn.cmd stop-balancer.sh stop-yarn.cmd
hadoop-daemon.sh httpfs.sh slaves.sh start-dfs.cmd start-yarn.sh stop-dfs.cmd stop-yarn.sh
hadoop-daemons.sh kms.sh start-all.cmd start-dfs.sh stop-all.cmd stop-dfs.sh yarn-daemon.sh
hdfs-config.cmd mr-jobhistory-daemon.sh start-all.sh start-secure-dns.sh stop-all.sh stop-secure-dns.sh yarn-daemons.sh
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# ./start-dfs.sh
Starting namenodes on [host.docker.internal]
host.docker.internal: starting namenode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-namenode-LAPTOP-P1LA53KS.out
localhost: starting datanode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-datanode-LAPTOP-P1LA53KS.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# jps
21750 NameNode
22295 SecondaryNameNode
21992 DataNode
22508 Jps
再来上传一下文件到HDFS,并最终上传成功
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# cd /mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets/
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# pwd
/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# ls kmean* -al
-rwxrwxrwx 1 root root 107 Dec 25 08:26 kmeans_data.txt
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# hadoop fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2022-01-22 05:36 /Hadoopfiles
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# hadoop fs -put kmeans_data.txt /Hadoopfiles
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# hadoop fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2022-01-22 06:30 /Hadoopfiles
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# hadoop fs -ls -R /
drwxr-xr-x - root supergroup 0 2022-01-22 06:30 /Hadoopfiles
-rw-r--r-- 1 root supergroup 107 2022-01-22 06:30 /Hadoopfiles/kmeans_data.txt
16.使用java代码读取hdfs文件
package GadaiteGroupID.Hadoop;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
public class ConnectHdfs {
public static void main(String[] args) {
try {
// 配置连接地址
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://192.168.1.5:9000");
FileSystem fs = FileSystem.get(conf);
// 打开文件并读取输出
Path test = new Path("/Hadoopfiles/kmeans_data.txt");
FSDataInputStream ins = fs.open(test);
int ch = ins.read();
while (ch != -1) {
System.out.print((char)ch);
ch = ins.read();
}
System.out.println();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
}
输出结果:
ERROR [main] - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:382)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:397)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:390)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2820)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2816)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2682)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:171)
at GadaiteGroupID.Hadoop.ConnectHdfs.main(ConnectHdfs.java:12)
0.0 0.0 0.0
0.1 0.1 0.1
0.2 0.2 0.2
0.4 0.4 0.4
0.5 0.5 0.5
9.8 0.8 0.8
9.0 9.0 9.0
9.1 9.1 9.1
9.2 9.2 9.2
Process finished with exit code 0