hadoop-环境搭建

预置操作
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/jre
Standalone

此模式下,只允许操作本地文件系统

// core-default.xml
<property>
  <name>fs.defaultFS</name>
  <value>file:///</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

操作示例

$ bin/hdfs dfs -ls .
Found 7 items
drwxr-xr-x   - 501 wheel       4096 2020-05-17 00:53 bin
drwxr-xr-x   - 501 wheel       4096 2020-05-17 00:53 etc
drwxr-xr-x   - 501 wheel       4096 2020-05-17 00:53 include
drwxr-xr-x   - 501 wheel       4096 2020-05-17 00:53 lib
drwxr-xr-x   - 501 wheel       4096 2020-05-17 00:53 libexec
drwxr-xr-x   - 501 wheel       4096 2020-05-17 00:53 sbin
drwxr-xr-x   - 501 wheel       4096 2014-08-06 17:46 share

$ mkdir input
$ cp etc/hadoop/*.xml input
# 运行前需要output目录不存在
# hadoop-mapreduce-examples的用法:
# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples.jar
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*
Pseudo-Distributed
hdfs 环境准备

etc/hadoop/core-site.xml

<!--默认-->
<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>
<!--改成可以持久化的目录-->
<property>
  <name>hadoop.tmp.dir</name>
  <value>/root/hadoop/tmp</value>
</property>

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
</property>

etc/hadoop/hdfs-site.xml

<!--默认-->
<property>
  <name>dfs.replication</name>
  <value>3</value>
  <description>Default block replication. 
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>
<!--需要改成1, 因为只有一个数据节点-->
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>

配置ssh免密访问,已配置,可路过

  $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys
#格式化namenode
$ bin/hdfs namenode -format
#启动hdfs
$ sbin/start-dfs.sh

hdfs环境搭建完成
测试

$ bin/hdfs dfs -put input /input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar grep /input /output 'dfs[a-z.]+'

#查看测试结果
$ bin/hdfs dfs -ls -R /output
-rw-r--r--   1 root supergroup          0 2020-05-17 01:50 /output/_SUCCESS
-rw-r--r--   1 root supergroup         11 2020-05-17 01:50 /output/part-r-00000
# 重复操作记得清理输出文件
yarn 环境准备

etc/hadoop/mapred-site.xml

<!--default-->
<property>
  <name>mapreduce.framework.name</name>
  <value>local</value>
  <description>The runtime framework for executing MapReduce jobs.
  Can be one of local, classic or yarn.
  </description>
</property>
<!--change to yarn-->
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

etc/hadoop/yarn-site.xml

<!--default--->
  <property>
    <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
    <name>yarn.nodemanager.aux-services</name>
    <value></value>
    <!--<value>mapreduce_shuffle</value>-->
  </property>
<!---NodeManager执行MR任务的方式Shuffle-->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

yarn环境配置完成
测试

sbin/start-yarn.sh
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar grep /input /output 'dfs[a-z.]+'
jobhistory

etc/hadoop/yarn-site.xml

<!--允许yarn日志聚合-->
  <property>
    <description>Whether to enable log aggregation. Log aggregation collects
      each container's logs and moves these logs onto a file-system, for e.g.
      HDFS, after the application completes. Users can configure the
      "yarn.nodemanager.remote-app-log-dir" and
      "yarn.nodemanager.remote-app-log-dir-suffix" properties to determine
      where these logs are moved to. Users can access the logs via the
      Application Timeline Server.
    </description>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

启动测试

sbin/stop-yarn.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver

再启动测试,就可以历史信息了
http://single-node:8088/cluster

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值