1.版本选择
hadoop-2.6.0.tar.gz
zookeeper-3.4.6.tar.gz
本文主要参考:http://www.aboutyun.com/thread-11909-1-1.html
2.集群规划
主机名 | IP地址 | 安装的软件 | 运行的进程 |
dehadp01 | 192.168.18.201 | jdk,hadoop | namenode,resourcemanager,zkfc |
dehadp02 | 192.168.18.202 | jdk,hadoop | namenode,resourcemanager,zkfc |
dehadp03 | 192.168.18.203 | jdk,hadoop,zookeeper | datanode,nodemanager,journalnode |
dehadp04 | 192.168.18.204 | jdk,hadoop,zookeeper | datanode,nodemanager,journalnode |
dehadp05 | 192.168.18.205 | jdk,hadoop,zookeeper | datanode,nodemanager,journalnode |
说明:
1) 在hadoop2.0中通常由两个namenode组成,一个处于active状态,另一个处于standby状态。active namenode对外提供服务,而standby namenode则不对外提供服务,仅同步active namenode的状态,以便能够在它失败的快速进行切换。hadoop2.0提供两种hdfs ha的解决方案,一种是nfs,另一种是qjm。这里使用的是qjm。主备namenode之间通过一组journalnode同步元数据信息,一条数据只要成功写入多数journalnode即认为写入成功。通常配置奇数个journalnode。这里还配置了一个zookeeper集群,用于zkfc(DFSZKFailoverController)故障转移,当active namenode挂掉,会自动切换standby namenode为active状态。
2) hadoop2.2.0中依然存在resourcemanager的单点故障问题,在hadoop2.4.1中解决了这个问题。有两个resourcemanger,一个是active,一个standby,状态由zookeeper进行协调。另外这里resourcemanger和namenode放在了一起,有条件最好分开放,resourcemanager还是很耗资源的。
3.系统准备
1) 安装操作系统
操作系统无特别要求,我这里选择的是rhel6.2(64)。安装时注意计算机名最好不要加域名,否则后面调用的时候比较麻烦。安装包选择最小化即可。
2) 关闭selinux和iptables
[root@dehadp01 ~]# vi /etc/selinux/config
#SELINUX=enforcing
SELINUX=disabled
[root@dehadp01 ~]# /etc/init.d/iptables stop
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
[root@dehadp01 ~]# chkconfig iptables off
[root@dehadp01 ~]# chkconfig --list iptables
iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
3) 配置/etc/hosts
[root@dehadp01 ~]# vi /etc/hosts
192.168.18.201 dehadp01
192.168.18.202 dehadp02
192.168.18.203 dehadp03
192.168.18.204 dehadp04
192.168.18.205 dehadp05
4) 配置ntp时间同步
[root@dehadp01 ~]# vi /etc/ntp.conf
server 192.168.17.91
[root@dehadp01 ~]# /etc/init.d/ntpd start
Starting ntpd: [ OK ]
[root@dehadp01 ~]# chkconfig ntpd on
[root@dehadp01 ~]# chkconfig --list ntpd
ntpd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
5) 创建用户并配置ssh免密码登录
每一台主机都如此操作
[root@dehadp01 ~]# useradd grid
[root@dehadp01 ~]# passwd grid
Changing password for user grid.
New password:
BAD PASSWORD: it is too short
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.
[root@dehadp01 ~]# su - grid
[grid@dehadp01 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/grid/.ssh/id_rsa):
Created directory '/home/grid/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/grid/.ssh/id_rsa.
Your public key has been saved in /home/grid/.ssh/id_rsa.pub.
The key fingerprint is:
d1:9d:45:e6:7c:a8:b0:8f:4d:f0:70:a5:14:b5:d6:e5 grid@dehadp01
The key's randomart image is:
+--[ RSA 2048]----+
| o+* .|
| . o O =.|
| . = = * E|
| . B o . |
| S . + |
| = |
| . o |
| |
| |
+-----------------+
[grid@dehadp01 ~]$ cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
[grid@dehadp01 ~]$ chmod 600 .ssh/authorized_keys
下面操作只需在dehadp01上执行即可
[grid@dehadp01 ~]$ ssh dehadp02 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
The authenticity of host 'dehadp02 (192.168.18.202)' can't be established.
RSA key fingerprint is 1c:16:df:b0:13:11:47:15:dc:5f:24:94:85:af:33:76.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'dehadp02,192.168.18.202' (RSA) to the list of known hosts.
grid@dehadp02's password:
[grid@dehadp01 ~]$ ssh dehadp03 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
The authenticity of host 'dehadp03 (192.168.18.203)' can't be established.
RSA key fingerprint is 1c:16:df:b0:13:11:47:15:dc:5f:24:94:85:af:33:76.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'dehadp03,192.168.18.203' (RSA) to the list of known hosts.
grid@dehadp03's password:
[grid@dehadp01 ~]$ ssh dehadp04 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
The authenticity of host 'dehadp04 (192.168.18.204)' can't be established.
RSA key fingerprint is 1c:16:df:b0:13:11:47:15:dc:5f:24:94:85:af:33:76.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'dehadp04,192.168.18.204' (RSA) to the list of known hosts.
grid@dehadp04's password:
[grid@dehadp01 ~]$ ssh dehadp05 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
The authenticity of host 'dehadp05 (192.168.18.205)' can't be established.
RSA key fingerprint is 1c:16:df:b0:13:11:47:15:dc:5f:24:94:85:af:33:76.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'dehadp05,192.168.18.205' (RSA) to the list of known hosts.
grid@dehadp05's password:
[grid@dehadp01 ~]$ scp ~/.ssh/authorized_keys dehadp02:/home/grid/.ssh/authorized_keys
grid@dehadp02's password:
authorized_keys 100% 1975 1.9KB/s 00:00
[grid@dehadp01 ~]$ scp ~/.ssh/authorized_keys dehadp03:/home/grid/.ssh/authorized_keys
grid@dehadp03's password:
authorized_keys 100% 1975 1.9KB/s 00:00
[grid@dehadp01 ~]$ scp ~/.ssh/authorized_keys dehadp04:/home/grid/.ssh/authorized_keys
grid@dehadp04's password:
authorized_keys 100% 1975 1.9KB/s 00:00
[grid@dehadp01 ~]$ scp ~/.ssh/authorized_keys dehadp05:/home/grid/.ssh/authorized_keys
grid@dehadp05's password:
authorized_keys 100% 1975 1.9KB/s 00:00
验证ssh免密码登录
[grid@dehadp01 ~]$ ssh dehadp02 date
Wed Jul 22 15:28:07 CST 2015
[grid@dehadp01 ~]$ ssh dehadp03 date
Wed Jul 22 15:28:10 CST 2015
[grid@dehadp01 ~]$ ssh dehadp04 date
Wed Jul 22 15:28:12 CST 2015
[grid@dehadp01 ~]$ ssh dehadp05 date
Wed Jul 22 15:28:15 CST 2015
6) 安装jdk,配置环境变量
[root@dehadp01 ~]# tar zxvf jdk.17.tar.gz -C /opt/
。。。
[root@dehadp01 ~]# mv /opt/jdk1.7.0_71/ /opt/jdk1.7
[root@dehadp01 ~]# su - grid
[grid@dehadp01 ~]$ vi .bash_profile
export JAVA_HOME=/opt/jdk1.7
export HADOOP_HOME=/home/grid/hadoop
export HBASE_HOME=/home/grid/hbase
export HIVE_HOME=/home/grid/hive
export SQOOP_HOME=/home/grid/sqoop
export FLUME_HOME=/home/grid/flume
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin:$SQOOP_HOME/bin:$FLUME_HOME/bin
4.hadoop安装配置
4.1安装zookeeper集群(在dehadp03上)
-[grid@dehadp03 ~]$ tar zxvf zookeeper-3.4.6.tar.gz
。。。
[grid@dehadp03 ~]$ mv zookeeper-3.4.6 zookeeper
4.1.2修改配置
[grid@dehadp03 ~]$ cp zookeeper/conf/zoo_sample.cfg zookeeper/conf/zoo.cfg
[grid@dehadp03 ~]$ vi zookeeper/conf/zoo.cfg
dataDir=/home/grid/zookeeper/data
server.1=dehadp03:2888:3888
server.2=dehadp04:2888:3888
server.3=dehadp05:2888:3888
4.1.3创建id文件
[grid@dehadp03 ~]$ mkdir -p /home/grid/zookeeper/data
[grid@dehadp03 ~]$ echo 1 > /home/grid/zookeeper/data/myid
4.1.4将配置好的zookeeper复制到其他节点(dehadp04,dehadp05)
[grid@dehadp03 ~]$ scp -r zookeeper grid@dehadp04:/home/grid
。。。
[grid@dehadp03 ~]$ scp -r zookeeper grid@dehadp05:/home/grid
。。。
4.1.5分别修改dehadp04,dehadp05上的id文件
[grid@dehadp04 ~]$ echo 2 > /home/grid/zookeeper/data/myid
[grid@dehadp05 ~]$ echo 3 > /home/grid/zookeeper/data/myid
4.1.6在各个节点上分别启动zookooper
[grid@dehadp03 ~]$ cd zookeeper/bin/
[grid@dehadp03 bin]$ ./zkServer.sh start
JMX enabled by default
Using config: /home/grid/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
4.2安装配置hadoop集群
4.2.1解压
[grid@dehadp01 ~]$ tar zxvf hadoop-2.6.0.tar.gz
。。。
[grid@dehadp01 ~]$ mv hadoop-2.6.0 hadoop
因为网上下载的hadoop2.x-bin系列的都是32位的,64位的需要自行编译。所以还要把32位的二进制包替换
hadoop-native-64-2.6.0.tar分享:
http://pan.baidu.com/s/1kTGgWpL
[grid@dehadp01 ~]$ tar xvf hadoop-native-64-2.6.0.tar -C hadoop/lib/native/
./
./libhadoop.a
./libhadoop.so
./libhadoop.so.1.0.0
./libhadooppipes.a
./libhadooputils.a
./libhdfs.a
./libhdfs.so
./libhdfs.so.0.0.0
4.2.2修改hadoop-env.sh
[grid@dehadp01 ~]$ vi hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/opt/jdk1.7
4.2.3修改core-site.xml
[grid@dehadp01 ~]$ vi hadoop/etc/hadoop/core-site.xml
<configuration>
<!-- 指定hdfs的nameservice为masters -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://masters</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/grid/hadoop/data/tmp</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>dehadp03:2181,dehadp04:2181,dehadp05:2181</value>
</property>
</configuration>
4.2.4修改hdfs-stie.xml
[grid@dehadp01 ~]$ vi hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<!--指定hdfs的nameservice为masters,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>masters</value>
</property>
<!-- masters下面有两个NameNode,分别是dehadp01,dehadp02 -->
<property>
<name>dfs.ha.namenodes.masters</name>
<value>dehadp01,dehadp02</value>
</property>
<!-- dehadp01的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.masters.dehadp01</name>
<value>dehadp01:9000</value>
</property>
<!-- dehadp01的http通信地址 -->
<property>
<name>dfs.namenode.http-address.masters.dehadp01</name>
<value>dehadp01:50070</value>
</property>
<!-- dehadp02的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.masters.dehadp02</name>
<value>dehadp02:9000</value>
</property>
<!-- dehadp02的http通信地址 -->
<property>
<name>dfs.namenode.http-address.masters.dehadp02</name>
<value>dehadp02:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://dehadp03:8485;dehadp04:8485;dehadp05:8485/masters</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/grid/hadoop/data/journal</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.masters</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/grid/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
4.2.5修改mapred-site.xml
[grid@dehadp01 ~]$ vi hadoop/etc/hadoop/mapred-site.xml
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4.2.6修改yarn-site.xml
[grid@dehadp01 ~]$ vi hadoop/etc/hadoop/yarn-site.xml
<configuration>
<!-- 开启RM高可靠 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>RM_HA_ID</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>dehadp01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>dehadp02</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>dehadp03:2181,dehadp04:2181,dehadp05:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
4.2.7修改slaves
[grid@dehadp01 ~]$ vi hadoop/etc/hadoop/slaves
dehadp03
dehadp04
dehadp05
4.2.8将配置好的hadoop复制到其他节点
[grid@dehadp01 ~]$ scp -r hadoop grid@dehadp02:/home/grid
。。。
[grid@dehadp01 ~]$ scp -r hadoop grid@dehadp03:/home/grid
。。。
[grid@dehadp01 ~]$ scp -r hadoop grid@dehadp04:/home/grid
。。。
[grid@dehadp01 ~]$ scp -r hadoop grid@dehadp05:/home/grid
。。。
4.3启动格式化启动hadoop集群
4.3.1启动journalnode(分别在dehadp03,dehadp04,dehadp05上执行 )
[grid@dehadp03 ~]$ hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/grid/hadoop/logs/hadoop-grid-journalnode-dehadp03.out
4.3.2格式化hdfs
[grid@dehadp01 ~]$ hdfs namenode -format
。。。
将dehadp01上生成的data文件夹复制到dehadp02的相同目录下
[grid@dehadp01 ~]$ scp -r hadoop/data/ grid@dehadp02:/home/grid/hadoop
VERSION 100% 207 0.2KB/s 00:00
fsimage_0000000000000000000.md5 100% 62 0.1KB/s 00:00
fsimage_0000000000000000000 100% 351 0.3KB/s 00:00
seen_txid 100% 2 0.0KB/s 00:00
4.3.3格式化zk(在dehadp01上执行)
[grid@dehadp01 ~]$ hdfs zkfc -formatZK
。。。
4.3.4启动hdfs
[grid@dehadp01 ~]$ start-dfs.sh
Starting namenodes on [dehadp02 dehadp01]
The authenticity of host 'dehadp01 (192.168.18.201)' can't be established.
RSA key fingerprint is 1c:16:df:b0:13:11:47:15:dc:5f:24:94:85:af:33:76.
Are you sure you want to continue connecting (yes/no)? dehadp02: starting namenode, logging to /home/grid/hadoop/logs/hadoop-grid-namenode-dehadp02.out
yes
dehadp01: Warning: Permanently added 'dehadp01,192.168.18.201' (RSA) to the list of known hosts.
dehadp01: starting namenode, logging to /home/grid/hadoop/logs/hadoop-grid-namenode-dehadp01.out
dehadp04: starting datanode, logging to /home/grid/hadoop/logs/hadoop-grid-datanode-dehadp04.out
dehadp05: starting datanode, logging to /home/grid/hadoop/logs/hadoop-grid-datanode-dehadp05.out
dehadp03: starting datanode, logging to /home/grid/hadoop/logs/hadoop-grid-datanode-dehadp03.out
Starting journal nodes [dehadp03 dehadp04 dehadp05]
dehadp04: journalnode running as process 22254. Stop it first.
dehadp05: journalnode running as process 19209. Stop it first.
dehadp03: journalnode running as process 15052. Stop it first.
Starting ZK Failover Controllers on NN hosts [dehadp02 dehadp01]
dehadp01: starting zkfc, logging to /home/grid/hadoop/logs/hadoop-grid-zkfc-dehadp01.out
dehadp02: starting zkfc, logging to /home/grid/hadoop/logs/hadoop-grid-zkfc-dehadp02.out
4.3.5启动yarn
[grid@dehadp01 ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/grid/hadoop/logs/yarn-grid-resourcemanager-dehadp01.out
dehadp04: starting nodemanager, logging to /home/grid/hadoop/logs/yarn-grid-nodemanager-dehadp04.out
dehadp05: starting nodemanager, logging to /home/grid/hadoop/logs/yarn-grid-nodemanager-dehadp05.out
dehadp03: starting nodemanager, logging to /home/grid/hadoop/logs/yarn-grid-nodemanager-dehadp03.out
注意:dehadp02上的standby resourcemanger是需要手动启动的
[grid@dehadp02 ~]$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /home/grid/hadoop/logs/yarn-grid-resourcemanager-dehadp02.out
4.4查看hadoop集群状态
4.4.1通过web查看集群状态
查看namenode
查看resourcemanger
4.4.2通过hdfs命名查看集群状态
[grid@dehadp01 ~]$ hdfs dfsadmin -report
Configured Capacity: 112779337728 (105.03 GB)
Present Capacity: 101695242240 (94.71 GB)
DFS Remaining: 101695168512 (94.71 GB)
DFS Used: 73728 (72 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes (3):
Name: 192.168.18.204:50010 (dehadp04)
Hostname: dehadp04
Decommission Status : Normal
Configured Capacity: 37593112576 (35.01 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 3665035264 (3.41 GB)
DFS Remaining: 33928052736 (31.60 GB)
DFS Used%: 0.00%
DFS Remaining%: 90.25%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jul 23 14:46:54 CST 2015
Name: 192.168.18.205:50010 (dehadp05)
Hostname: dehadp05
Decommission Status : Normal
Configured Capacity: 37593112576 (35.01 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 3560112128 (3.32 GB)
DFS Remaining: 34032975872 (31.70 GB)
DFS Used%: 0.00%
DFS Remaining%: 90.53%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jul 23 14:46:54 CST 2015
Name: 192.168.18.203:50010 (dehadp03)
Hostname: dehadp03
Decommission Status : Normal
Configured Capacity: 37593112576 (35.01 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 3858948096 (3.59 GB)
DFS Remaining: 33734139904 (31.42 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.73%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jul 23 14:46:51 CST 2015
4.5试用hadoop
4.5.1 hadoop日常操作
[grid@dehadp01 ~]$ hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] <path> ...]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] <path> ...]
[-expunge]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-usage [cmd ...]]
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
将一个日志文件传入hadoop集群:
[grid@dehadp01 ~]$ hadoop fs -mkdir -p /home/grid/txt
[grid@dehadp01 ~]$ hadoop fs -put hadoop/logs/hadoop-grid-namenode-dehadp01.log /home/grid/txt
查看刚传入的文件:
[grid@dehadp01 ~]$ hadoop fs -ls /home/grid/txt
Found 1 items
-rw-r--r-- 3 grid supergroup 39803 2015-07-23 14:52 /home/grid/txt/hadoop-grid-namenode-dehadp01.log
4.5.2运行mapreuce样例wordcount,统计文件内每个单词出现的次数
[grid@dehadp01 ~]$ hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /home/grid/txt/hadoop-grid-namenode-dehadp01.log /home/grid/txt/woutcount_out
。。。
查看运行的结果:
[grid@dehadp01 ~]$ hadoop fs -cat /home/grid/txt/woutcount_out/part-r-00000
。。。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/20777547/viewspace-1745820/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/20777547/viewspace-1745820/