Hadoop，HDFS文件系统单机环境配置

最新推荐文章于 2025-04-15 08:07:37 发布

Gadaite

最新推荐文章于 2025-04-15 08:07:37 发布

阅读量1.4w

点赞数 1

分类专栏： Linux环境文章标签： hadoop hdfs big data

本文链接：https://blog.youkuaiyun.com/weixin_46408961/article/details/123438435

版权

Linux环境专栏收录该内容

11 篇文章

订阅专栏

本文详细介绍了如何为Hadoop环境添加用户、分配系统权限、安装Java、解压和配置Hadoop，包括修改环境变量、启动服务、解决Spark警告、设置SSH免密登录，以及配置HDFS和web界面。还涉及了创建文件夹、上传文件、Java读取HDFS和常见问题排查。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

01.添加hadoop用户组到系统用户安装前要做一件事

添加一个名为hadoop的用户到系统用户，专门用来做hadoop测试

(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs# sudo addgroup hadoop
Adding group `hadoop' (GID 1000) ...
Done.
(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs# sudo adduser --ingroup hadoop hadoop
Adding user `hadoop' ...
Adding new user `hadoop' (1000) with group `hadoop' ...
Creating home directory `/home/hadoop' ...
Copying files from `/etc/skel' ...
New password:LYP809834049
Retype new password:LYP809834049
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
        Full Name []:
        Room Number []:
        Work Phone []:
        Home Phone []:
        Other []:
Is the information correct? [Y/n] Y

02.给hadoop用户系统权限

打开/etc/sudoers并修改，插入语句如下：

hadoop    ALL=(ALL:ALL) ALL

03.java安装，之前已装有java

(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs# java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~20.04-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

04.解压安装hadoop

root@Windows-2021WEO:/mnt/e/win_ubuntu/envs# tar zxvf hadoop-2.7.7.tar.gz

05.修改下面目录的文件

root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# pwd
/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop
root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# ls
capacity-scheduler.xml  hadoop-metrics.properties   httpfs-signature.secret  log4j.properties            ssl-client.xml.example
configuration.xsl       hadoop-metrics2.properties  httpfs-site.xml          mapred-env.cmd              ssl-server.xml.example
container-executor.cfg  hadoop-policy.xml           kms-acls.xml             mapred-env.sh               yarn-env.cmd
core-site.xml           hdfs-site.xml               kms-env.sh               mapred-queues.xml.template  yarn-env.sh
hadoop-env.cmd          httpfs-env.sh               kms-log4j.properties     mapred-site.xml.template    yarn-site.xml
hadoop-env.sh           httpfs-log4j.properties     kms-site.xml             slaves
root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# vim hadoop-env.sh

添加内容并source生效：

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

06.修改总环境变量

root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# vim /etc/profile

添加内容并source生效

#hadoop
export HADOOP_HOME=/mnt/e/win_ubuntu/envs/hadoop-2.7.7
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH

07.将/etc/profile中新增的部分在root下的环境变量中再导入一次，并source生效

root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# vim /root/.bashrc
root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# source /root/.bashrc

08.查看hadoop是否可用

(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# whereis hadoop
hadoop: /mnt/e/win_ubuntu/envs/hadoop-2.7.7/bin/hadoop /mnt/e/win_ubuntu/envs/hadoop-2.7.7/bin/hadoop.cmd
(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.
(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# hadoop version
Hadoop 2.7.7
Subversion Unknown -r c1aad84bd27cd79c3d1a7dd58202a8c3ee1ed3ac
Compiled by stevel on 2018-07-18T22:47Z
Compiled with protoc 2.5.0
From source with checksum 792e15d20b12c74bd6f19a1fb886490
This command was run using /mnt/e/win_ubuntu/envs/hadoop-2.7.7/share/hadoop/common/hadoop-common-2.7.7.jar

09.启动hadoop

(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/sbin# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Error: Cannot find configuration directory: /etc/hadoop
starting yarn daemons
Error: Cannot find configuration directory: /etc/hadoop

需要启动顺序

（先后顺序启动）
[root@master sbin]# ./start-dfs.sh
[root@master sbin]# ./start-yarn.sh

10.spark-shell启动，Hadoop警告

(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/hadoop-2.7.7/etc/hadoop# spark-shell
22/01/12 23:18:03 WARN Utils: Your hostname, Windows-2021WEO resolves to a loopback address: 127.0.1.1; using 192.168.1.4 instead (on interfa
ce eth0)
22/01/12 23:18:03 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/01/12 23:18:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicabl
e

11.配置环境变量,source激活

/etc/profile添加，并同步到/root下的环境变量

export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/

SPARK_HOME/conf/spark-env.sh添加以下内容，并激活生效

export LD_LIBRARY_PATH=$JAVA_LIBRARY_PATH

12.最后查看spark-shell中的hadoop警告是否存在

(base) root@Windows-2021WEO:/mnt/e/win_ubuntu/envs/spark-2.4.5-bin-hadoop2.7/conf# spark-shell
22/01/12 23:35:13 WARN Utils: Your hostname, Windows-2021WEO resolves to a loopback address: 127.0.1.1; using 192.168.1.4 instead (on interface eth0)
22/01/12 23:35:13 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.1.4:4040
Spark context available as 'sc' (master = local[*], app id = local-1642001741543).
Spark session available as 'spark'.

13.hadoop免密验证登录

step1：查看是否已有公钥和私钥，这已经有，没有的话需要生成

(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# cd /root/.ssh/
(base) root@LAPTOP-P1LA53KS:~/.ssh# ls
id_rsa  id_rsa.pub  known_hosts

step2：新建授权列表文件

(base) root@LAPTOP-P1LA53KS:~/.ssh# ls
id_rsa  id_rsa.pub  known_hosts
(base) root@LAPTOP-P1LA53KS:~/.ssh# touch authorized_keys
(base) root@LAPTOP-P1LA53KS:~/.ssh# ls
authorized_keys  id_rsa  id_rsa.pub  known_hosts

step3：更改授权列表文件的权限

(base) root@LAPTOP-P1LA53KS:~/.ssh# chmod 600 authorized_keys
(base) root@LAPTOP-P1LA53KS:~/.ssh# ls
authorized_keys  id_rsa  id_rsa.pub  known_hosts
(base) root@LAPTOP-P1LA53KS:~/.ssh# ls -al
total 8
drwx------ 1 root root 4096 Jan 21 17:04 .
drwx------ 1 root root 4096 Jan 21 16:25 ..
-rw------- 1 root root    0 Jan 21 17:04 authorized_keys
-rw------- 1 root root 2610 Jan 17 06:53 id_rsa
-rw-r--r-- 1 root root  575 Jan 17 06:53 id_rsa.pub
-rw-r--r-- 1 root root  888 Jan 19 00:27 known_hosts

step4：将公钥中的值加入到授权列表，显然这个密钥对是当初布置GitHub时生成的

(base) root@LAPTOP-P1LA53KS:~/.ssh# cat id_rsa.pub >> authorized_keys
(base) root@LAPTOP-P1LA53KS:~/.ssh# cat authorized_keys
ssh-rsa (这个就不给大家看了)

step5：重新启动Hadoop验证是否还是需要密码

(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
localhost: starting namenode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-namenode-LAPTOP-P1LA53KS.out
localhost: starting datanode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-datanode-LAPTOP-P1LA53KS.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out
starting yarn daemons
starting resourcemanager, logging to /mnt/e/hadoop-2.7.7/logs/yarn-root-resourcemanager-LAPTOP-P1LA53KS.out
localhost: starting nodemanager, logging to /mnt/e/hadoop-2.7.7/logs/yarn-root-nodemanager-LAPTOP-P1LA53KS.out
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# jps -l
1314 kafka.Kafka
2952 org.apache.flink.runtime.taskexecutor.TaskManagerRunner
19689 sun.tools.jps.Jps
19100 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
477 org.apache.zookeeper.server.quorum.QuorumPeerMain
2414 org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint
19487 org.apache.hadoop.yarn.server.nodemanager.NodeManager

14.Hadoop的web界面配置情况

step1：配置hdfs-site.xml文件

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
</configuration>

step2：配置core-site.xml文件

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://192.168.1.5:9000</value>
        </property>
</configuration>

step3：格式化文件结构

(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/etc/hadoop# hdfs namenode -format
22/01/22 04:18:34 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = LAPTOP-P1LA53KS.localdomain/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.7

step4：启动hadoop查看界面

界面地址：http://192.168.1.5:8088/cluster

结果：成功

15.Hadoop创建hdfs文件夹与上传文件

参考：https://blog.youkuaiyun.com/weixin_36250487/article/details/80634005

创建HDFS目录如下

# 显示根目录 / 下的文件和子目录，绝对路径
hadoop fs -ls /
# 新建文件夹，绝对路径
hadoop fs -mkdir /Hadoopfiles

上传文件到HDFS

(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# pwd
/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# ls kmean*
kmeans_data.txt
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# hadoop fs -put kmeans_data.txt /Hadoopfiles/

结果：

put: File /Hadoopfiles/kmeans_data.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

重新启动并查看：

(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# ./start-dfs.sh
Starting namenodes on [host.docker.internal]
host.docker.internal: starting namenode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-namenode-LAPTOP-P1LA53KS.out
localhost: starting datanode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-datanode-LAPTOP-P1LA53KS.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# jps
9713 SecondaryNameNode
9171 NameNode
9930 Jps

结果：jps发现DataNode进程没有启动

查看一下datanode日志

(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7# cd logs/
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/logs# ls
SecurityAuth-root.audit                     hadoop-root-namenode-LAPTOP-P1LA53KS.out.4           yarn-root-nodemanager-LAPTOP-P1LA53KS.out.2
hadoop-root-datanode-LAPTOP-P1LA53KS.log    hadoop-root-namenode-LAPTOP-P1LA53KS.out.5           yarn-root-nodemanager-LAPTOP-P1LA53KS.out.3
hadoop-root-datanode-LAPTOP-P1LA53KS.out    hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.log    yarn-root-nodemanager-LAPTOP-P1LA53KS.out.4
hadoop-root-datanode-LAPTOP-P1LA53KS.out.1  hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out    yarn-root-nodemanager-LAPTOP-P1LA53KS.out.5
hadoop-root-datanode-LAPTOP-P1LA53KS.out.2  hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out.1  yarn-root-resourcemanager-LAPTOP-P1LA53KS.log
hadoop-root-datanode-LAPTOP-P1LA53KS.out.3  hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out.2  yarn-root-resourcemanager-LAPTOP-P1LA53KS.out
hadoop-root-datanode-LAPTOP-P1LA53KS.out.4  hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out.3  yarn-root-resourcemanager-LAPTOP-P1LA53KS.out.1
hadoop-root-datanode-LAPTOP-P1LA53KS.out.5  hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out.4  yarn-root-resourcemanager-LAPTOP-P1LA53KS.out.2
hadoop-root-namenode-LAPTOP-P1LA53KS.log    hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out.5  yarn-root-resourcemanager-LAPTOP-P1LA53KS.out.3
hadoop-root-namenode-LAPTOP-P1LA53KS.out    userlogs                                             yarn-root-resourcemanager-LAPTOP-P1LA53KS.out.4
hadoop-root-namenode-LAPTOP-P1LA53KS.out.1  yarn-root-nodemanager-LAPTOP-P1LA53KS.log            yarn-root-resourcemanager-LAPTOP-P1LA53KS.out.5
hadoop-root-namenode-LAPTOP-P1LA53KS.out.2  yarn-root-nodemanager-LAPTOP-P1LA53KS.out
hadoop-root-namenode-LAPTOP-P1LA53KS.out.3  yarn-root-nodemanager-LAPTOP-P1LA53KS.out.1
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/logs# ls *datanode* -al
-rwxrwxrwx 1 root root 215306 Jan 22 05:44 hadoop-root-datanode-LAPTOP-P1LA53KS.log
-rwxrwxrwx 1 root root    715 Jan 22 05:44 hadoop-root-datanode-LAPTOP-P1LA53KS.out
-rwxrwxrwx 1 root root    715 Jan 22 04:33 hadoop-root-datanode-LAPTOP-P1LA53KS.out.1
-rwxrwxrwx 1 root root    715 Jan 22 04:26 hadoop-root-datanode-LAPTOP-P1LA53KS.out.2
-rwxrwxrwx 1 root root    715 Jan 22 04:21 hadoop-root-datanode-LAPTOP-P1LA53KS.out.3
-rwxrwxrwx 1 root root    715 Jan 22 03:57 hadoop-root-datanode-LAPTOP-P1LA53KS.out.4
-rwxrwxrwx 1 root root    715 Jan 22 03:52 hadoop-root-datanode-LAPTOP-P1LA53KS.out.5
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/logs# cat  hadoop-root-datanode-LAPTOP-P1LA53KS.log

里面报错的原因namenode clusterID，datanode clusterID不一致原因：在于对HDFS重新做了格式化，导致版本不一致

2022-01-22 05:44:39,356 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/tmp/hadoop-root/dfs/data/
java.io.IOException: Incompatible clusterIDs in /tmp/hadoop-root/dfs/data: namenode clusterID = CID-709b35c3-5162-4658-85a8-6859b5593bb8; datanode clusterID = CID-8d8f8dcf-8cb1-4e7c-9010-c7d5eab887b0

这里发现一个奇怪的问题，我的Version是放置在了根目录的tmp下面，在Hadoop中没有tmp文件夹

原因可能来至于前面的xml配置不对

(base) root@LAPTOP-P1LA53KS:/tmp# pwd
/tmp
(base) root@LAPTOP-P1LA53KS:/tmp# ls *hadoop* -al
-rw-r--r-- 1 root root    5 Jan 22 05:44 hadoop-root-datanode.pid
-rw-r--r-- 1 root root    5 Jan 22 05:44 hadoop-root-namenode.pid
-rw-r--r-- 1 root root    5 Jan 22 05:44 hadoop-root-secondarynamenode.pid

hadoop-root:
total 0
drwxr-xr-x 1 root root 4096 Jan 19 14:36 .
drwxrwxrwt 1 root root 4096 Jan 22 05:44 ..
drwxr-xr-x 1 root root 4096 Jan 22 04:21 dfs
drwxr-xr-x 1 root root 4096 Jan 22 05:42 nm-local-dir
(base) root@LAPTOP-P1LA53KS:/tmp# cd hadoop-root/dfs/
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs# ls
data  name  namesecondary
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs# cd data/current/
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/data/current# ls
BP-1116483734-127.0.1.1-1642796315158  VERSION
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/data/current# cat VERSION
#Sat Jan 22 04:26:11 CST 2022
storageID=DS-1ad2fdba-a088-44ef-8e47-bc83e24a305e
clusterID=CID-8d8f8dcf-8cb1-4e7c-9010-c7d5eab887b0
cTime=0
datanodeUuid=ed0c62cd-f60c-4546-badf-8c646c57a1df
storageType=DATA_NODE
layoutVersion=-56

修改dfs/data/current/VERSION文件中，将datanode的clusterID的值，改为与namenode的clusterID的值

先看一下namenode

(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/name/current# pwd
/tmp/hadoop-root/dfs/name/current
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/name/current# ls V*
VERSION
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/name/current# cat VERSION
#Sat Jan 22 05:44:30 CST 2022
namespaceID=1777974116
clusterID=CID-709b35c3-5162-4658-85a8-6859b5593bb8
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1041636110-127.0.1.1-1642797147817
layoutVersion=-63

再看一下datanode

(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/data/current# cat VERSION
#Sat Jan 22 04:26:11 CST 2022
storageID=DS-1ad2fdba-a088-44ef-8e47-bc83e24a305e
clusterID=CID-8d8f8dcf-8cb1-4e7c-9010-c7d5eab887b0
cTime=0
datanodeUuid=ed0c62cd-f60c-4546-badf-8c646c57a1df
storageType=DATA_NODE
layoutVersion=-56
(base) root@LAPTOP-P1LA53KS:/tmp/hadoop-root/dfs/data/current# pwd
/tmp/hadoop-root/dfs/data/current

关闭进程重启一下dfs，DataNode已经启动了

(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# jps
9713 SecondaryNameNode
9171 NameNode
21322 Jps
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# kill -9 9713 9171
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# jps
21451 Jps
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# ls
distribute-exclude.sh  hdfs-config.sh           refresh-namenodes.sh  start-balancer.sh    start-yarn.cmd  stop-balancer.sh    stop-yarn.cmd
hadoop-daemon.sh       httpfs.sh                slaves.sh             start-dfs.cmd        start-yarn.sh   stop-dfs.cmd        stop-yarn.sh
hadoop-daemons.sh      kms.sh                   start-all.cmd         start-dfs.sh         stop-all.cmd    stop-dfs.sh         yarn-daemon.sh
hdfs-config.cmd        mr-jobhistory-daemon.sh  start-all.sh          start-secure-dns.sh  stop-all.sh     stop-secure-dns.sh  yarn-daemons.sh
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# ./start-dfs.sh
Starting namenodes on [host.docker.internal]
host.docker.internal: starting namenode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-namenode-LAPTOP-P1LA53KS.out
localhost: starting datanode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-datanode-LAPTOP-P1LA53KS.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /mnt/e/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-LAPTOP-P1LA53KS.out
(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# jps
21750 NameNode
22295 SecondaryNameNode
21992 DataNode
22508 Jps

再来上传一下文件到HDFS，并最终上传成功

(base) root@LAPTOP-P1LA53KS:/mnt/e/hadoop-2.7.7/sbin# cd /mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets/
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# pwd
/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# ls kmean* -al
-rwxrwxrwx 1 root root 107 Dec 25 08:26 kmeans_data.txt
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# hadoop fs -ls /
Found 1 items
drwxr-xr-x   - root supergroup          0 2022-01-22 05:36 /Hadoopfiles
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# hadoop fs -put kmeans_data.txt /Hadoopfiles
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# hadoop fs -ls /
Found 1 items
drwxr-xr-x   - root supergroup          0 2022-01-22 06:30 /Hadoopfiles
(base) root@LAPTOP-P1LA53KS:/mnt/f/CodeG50/sparkAll/src/main/scala/GadaiteGroupID/DataSets# hadoop fs -ls -R /
drwxr-xr-x   - root supergroup          0 2022-01-22 06:30 /Hadoopfiles
-rw-r--r--   1 root supergroup        107 2022-01-22 06:30 /Hadoopfiles/kmeans_data.txt

16.使用java代码读取hdfs文件

package GadaiteGroupID.Hadoop;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;

public class ConnectHdfs {
    public static void main(String[] args) {
        try {
            // 配置连接地址
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", "hdfs://192.168.1.5:9000");
            FileSystem fs = FileSystem.get(conf);
            // 打开文件并读取输出
            Path test = new Path("/Hadoopfiles/kmeans_data.txt");
            FSDataInputStream ins = fs.open(test);
            int ch = ins.read();
            while (ch != -1) {
                System.out.print((char)ch);
                ch = ins.read();
            }
            System.out.println();
        } catch (IOException ioe) {
            ioe.printStackTrace();
        }
    }
}

输出结果：

ERROR [main] - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
	at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:382)
	at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:397)
	at org.apache.hadoop.util.Shell.<clinit>(Shell.java:390)
	at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
	at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2820)
	at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2816)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2682)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:171)
	at GadaiteGroupID.Hadoop.ConnectHdfs.main(ConnectHdfs.java:12)
0.0 0.0 0.0
0.1 0.1 0.1
0.2 0.2 0.2
0.4 0.4 0.4
0.5 0.5 0.5
9.8 0.8 0.8
9.0 9.0 9.0
9.1 9.1 9.1
9.2 9.2 9.2

Process finished with exit code 0