hadoop 伪分布式安装

机器环境

Ubuntu 22.04.4 LTS
ip 10.0.0.203/24
hadoop 3.3.6

Hadoop配置的五个步骤

(1) 创建Hadoop用户 
(2) 安装java
(3) 设置SSH登录权限
(4) 单机安装配置
(5) 伪分布式安装配置

安装单机版Hadoop

1.创建hadoop用户(官方推荐,生产必做)或设置环境变量以root用户启动(仅适合单机测试/学习环境)

1 
adduser hadoop
passwd hadoop          # 给它设个密码

2
为 root 用户配置 SSH 无密码登录本机
# 如果还没有 ssh key,就生成一个
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

# 把公钥追加到本机 authorized_keys
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

# 测试
ssh localhost

2.安装 open jdk

将deb文件上传到服务器的/var/cache/apt/archives目录。
执行命令 sudo apt-get install openjdk-8-jdk 进行安装。
运行目录:/usr/lib/jvm/java-8-openjdk-amd64/bin

3.下载安装Hadoop软件包
Hadoop软件地址

root@ubt-2204# tar xf hadoop-3.3.6.tar.gz  -C /usr/local/
root@ubt-2204:/usr/local# cd /usr/local/
root@ubt-2204:/usr/local# chown -R hadoop:hadoop hadoop-3.3.6

4.修改hadoop-env.sh配置文件指定系统中的jdk路径

- 进入hadoop文件目录:
root@ubt-2204:/usr/local/hadoop-3.3.6# vim /usr/local/hadoop-3.3.6/etc/hadoop/hadoop-env.sh

- 修改JAVA_HOME环境变量:
grep -En '^export' /usr/local/hadoop-3.3.6/etc/hadoop/hadoop-env.sh 
55:export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

- 让配置生效
source /usr/local/hadoop-3.3.6/etc/hadoop/hadoop-env.sh

- 验证服务
/usr/local/hadoop-3.3.6/bin/hadoop version
-------------------------------------------------------
root@ubt-2204:/usr/local/hadoop-3.3.6/bin# /usr/local/hadoop-3.3.6/bin/hadoop version
Hadoop 3.3.6
Source code repository https://github.com/apache/hadoop.git -r 1be78238728da9266a4f88195058f08fd012bf9c
Compiled by ubuntu on 2023-06-18T08:22Z
Compiled on platform linux-x86_64
Compiled with protoc 3.7.1
From source with checksum 5652179ad55f76cb287d9c633bb53bbd
This command was run using /usr/local/hadoop-3.3.6/share/hadoop/common/hadoop-common-3.3.6.jar

5.测试hadoop是否安装成功

cd /usr/local/hadoop-3.3.6
root@ubt-2204:/usr/local/hadoop-3.3.6# mkdir ./input
root@ubt-2204:/usr/local/hadoop-3.3.6# cp ./etc/hadoop/*.xml ./input
root@ubt-2204:/usr/local/hadoop-3.3.6# ./bin/hadoop jar /usr/local/hadoop-3.3.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar grep ./input/ ./output/ 'dfs[a-z.]+'

- 输出以下信息
- root@ubt-2204:/usr/local/hadoop-3.3.6# ll output/
total 20
drwxr-xr-x  2 root   root   4096 Aug  4 14:09 ./
drwxr-xr-x 12 hadoop hadoop 4096 Aug  4 14:09 ../
-rw-r--r--  1 root   root      8 Aug  4 14:09 ._SUCCESS.crc
-rw-r--r--  1 root   root     12 Aug  4 14:09 .part-r-00000.crc
-rw-r--r--  1 root   root      0 Aug  4 14:09 _SUCCESS
-rw-r--r--  1 root   root     11 Aug  4 14:09 part-r-00000
root@ubt-2204:/usr/local/hadoop-3.3.6# cat ./output/*
1	dfsadmin 

hadoop伪分布式安装

配置文件的名称作用
core-site.xml核心配置文件,主要定义了我们文件访问的格式 hdfs://
hadoop-env.sh主要配置我们的java路径
hdfs-site.xml主要定义配置我们的hdfs的相关配置
mapred-site.xml主要定义我们的mapreduce相关的一些配置
slaves控制我们的从节点在哪里 datanode nodemanager在哪些机器上
yarm-site.xml配置我们的resourcemanager资源调度

1.对于伪分布式我们需要修改 core-site.xml hdfs-site.xml 文件

- 修改 core-site.xml 文件
root@ubt-2204:/usr/local/hadoop-3.3.6# vim /usr/local/hadoop-3.3.6/etc/hadoop/core-site.xml
<configuration>
       <!--指定namenode的地址-->
   <property>
               <name>fs.defaultFS</name>
               <value>hdfs://localhost:9000</value>
   </property>
   <!--用来指定使用hadoop时产生文件的存放目录-->
   <property>
            <name>hadoop.tmp.dir</name>
            <value>file:/usr/local/hadoop/tmp</value>
   </property>
       <!--用来设置检查点备份日志的最长时间-->
       <name>fs.checkpoint.period</name>
       <value>3600</value>

</configuration>

- 修改 hdfs-site.xml 文件
root@ubt-2204:/usr/local/hadoop-3.3.6/etc/hadoop# vim /usr/local/hadoop-3.3.6/etc/hadoop/hdfs-site.xml
<configuration>
    <!--指定hdfs保存数据的副本数量-->
    <property>
            <name>dfs.replication</name>
            <value>1</value>
    </property>
    <!--指定hdfs中namenode的存储位置-->
    <property>
             <name>dfs.namenode.name.dir</name>
             <value>file:/usr/local/hadoop/tmp/dfs/name</value>
    </property>
    <!--指定hdfs中datanode的存储位置-->
    <property>
             <name>dfs.datanode.data.dir</name>
             <value>file:/usr/local/hadoop/tmp/dfs/data</value>
    </property>

</configuration>

root@ubt-2204:/usr/local/hadoop-3.3.6# ./bin/hadoop namenode -format
......
2025-08-04 14:30:45,170 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2025-08-04 14:30:45,197 INFO namenode.FSNamesystem: Stopping services started for active state
2025-08-04 14:30:45,198 INFO namenode.FSNamesystem: Stopping services started for standby state
2025-08-04 14:30:45,204 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2025-08-04 14:30:45,205 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubt-2204/10.0.0.203
************************************************************/
出现以上代码,则为正确

2.启动运行hadoop

root@ubt-2204:/usr/local/hadoop-3.3.6# ./sbin/start-all.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [ubt-2204]
Starting resourcemanager
resourcemanager is running as process 2073.  Stop it first and ensure /tmp/hadoop-root-resourcemanager.pid file is empty before retry.
Starting nodemanagers

# 输入jps显示以下java进程信息,既为正确
root@ubt-2204:/usr/local/hadoop-3.3.6# jps 
3553 Jps
3427 NodeManager
2888 DataNode
2073 ResourceManager
3114 SecondaryNameNode
2746 NameNode

3.访问测试
http://10.0.0.203:9870/
在这里插入图片描述
注意!!!

hadoop-3.x版本的端口由50070改成了9870端口

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值