hadoop2.6 HA集群安装配置

1.版本选择

hadoop-2.6.0.tar.gz

zookeeper-3.4.6.tar.gz

 
本文主要参考:http://www.aboutyun.com/thread-11909-1-1.html

2.集群规划

主机名

IP地址

安装的软件

运行的进程

dehadp01

192.168.18.201

jdk,hadoop

namenode,resourcemanager,zkfc

dehadp02

192.168.18.202

jdk,hadoop

namenode,resourcemanager,zkfc

dehadp03

192.168.18.203

jdk,hadoop,zookeeper

datanode,nodemanager,journalnode

dehadp04

192.168.18.204

jdk,hadoop,zookeeper

datanode,nodemanager,journalnode

dehadp05

192.168.18.205

jdk,hadoop,zookeeper

datanode,nodemanager,journalnode

 

说明:

1) 在hadoop2.0中通常由两个namenode组成,一个处于active状态,另一个处于standby状态。active namenode对外提供服务,而standby namenode则不对外提供服务,仅同步active namenode的状态,以便能够在它失败的快速进行切换。hadoop2.0提供两种hdfs ha的解决方案,一种是nfs,另一种是qjm。这里使用的是qjm。主备namenode之间通过一组journalnode同步元数据信息,一条数据只要成功写入多数journalnode即认为写入成功。通常配置奇数个journalnode。这里还配置了一个zookeeper集群,用于zkfc(DFSZKFailoverController)故障转移,当active namenode挂掉,会自动切换standby namenode为active状态。

2) hadoop2.2.0中依然存在resourcemanager的单点故障问题,在hadoop2.4.1中解决了这个问题。有两个resourcemanger,一个是active,一个standby,状态由zookeeper进行协调。另外这里resourcemanger和namenode放在了一起,有条件最好分开放,resourcemanager还是很耗资源的。

3.系统准备

1) 安装操作系统

操作系统无特别要求,我这里选择的是rhel6.2(64)。安装时注意计算机名最好不要加域名,否则后面调用的时候比较麻烦。安装包选择最小化即可。

 

2) 关闭selinuxiptables

[root@dehadp01 ~]# vi /etc/selinux/config

#SELINUX=enforcing

SELINUX=disabled

 

[root@dehadp01 ~]# /etc/init.d/iptables stop

iptables: Flushing firewall rules:                         [  OK  ]

iptables: Setting chains to policy ACCEPT: filter          [  OK  ]

iptables: Unloading modules:                               [  OK  ]

[root@dehadp01 ~]# chkconfig iptables off

[root@dehadp01 ~]# chkconfig --list iptables

iptables        0:off 1:off 2:off 3:off 4:off 5:off 6:off

 

3) 配置/etc/hosts

[root@dehadp01 ~]# vi /etc/hosts

192.168.18.201  dehadp01

192.168.18.202  dehadp02

192.168.18.203  dehadp03

192.168.18.204  dehadp04

192.168.18.205  dehadp05

 

4) 配置ntp时间同步

[root@dehadp01 ~]# vi /etc/ntp.conf

server 192.168.17.91

 

[root@dehadp01 ~]# /etc/init.d/ntpd start

Starting ntpd:                                             [  OK  ]

[root@dehadp01 ~]# chkconfig ntpd on

[root@dehadp01 ~]# chkconfig --list ntpd

ntpd            0:off 1:off 2:on 3:on 4:on 5:on 6:off

 

5) 创建用户并配置ssh免密码登录

每一台主机都如此操作

[root@dehadp01 ~]# useradd grid

[root@dehadp01 ~]# passwd grid

Changing password for user grid.

New password: 

BAD PASSWORD: it is too short

BAD PASSWORD: is too simple

Retype new password: 

passwd: all authentication tokens updated successfully.

[root@dehadp01 ~]# su - grid

[grid@dehadp01 ~]$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/home/grid/.ssh/id_rsa): 

Created directory '/home/grid/.ssh'.

Enter passphrase (empty for no passphrase): 

Enter same passphrase again: 

Your identification has been saved in /home/grid/.ssh/id_rsa.

Your public key has been saved in /home/grid/.ssh/id_rsa.pub.

The key fingerprint is:

d1:9d:45:e6:7c:a8:b0:8f:4d:f0:70:a5:14:b5:d6:e5 grid@dehadp01

The key's randomart image is:

+--[ RSA 2048]----+

|            o+* .|

|         . o O =.|

|        . = = * E|

|         . B o . |

|        S . +    |

|           =     |

|          . o    |

|                 |

|                 |

+-----------------+

[grid@dehadp01 ~]$ cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

[grid@dehadp01 ~]$ chmod 600 .ssh/authorized_keys

 

下面操作只需在dehadp01上执行即可

[grid@dehadp01 ~]$ ssh dehadp02 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 

The authenticity of host 'dehadp02 (192.168.18.202)' can't be established.

RSA key fingerprint is 1c:16:df:b0:13:11:47:15:dc:5f:24:94:85:af:33:76.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'dehadp02,192.168.18.202' (RSA) to the list of known hosts.

grid@dehadp02's password: 

[grid@dehadp01 ~]$ ssh dehadp03 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 

The authenticity of host 'dehadp03 (192.168.18.203)' can't be established.

RSA key fingerprint is 1c:16:df:b0:13:11:47:15:dc:5f:24:94:85:af:33:76.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'dehadp03,192.168.18.203' (RSA) to the list of known hosts.

grid@dehadp03's password: 

[grid@dehadp01 ~]$ ssh dehadp04 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 

The authenticity of host 'dehadp04 (192.168.18.204)' can't be established.

RSA key fingerprint is 1c:16:df:b0:13:11:47:15:dc:5f:24:94:85:af:33:76.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'dehadp04,192.168.18.204' (RSA) to the list of known hosts.

grid@dehadp04's password: 

[grid@dehadp01 ~]$ ssh dehadp05 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 

The authenticity of host 'dehadp05 (192.168.18.205)' can't be established.

RSA key fingerprint is 1c:16:df:b0:13:11:47:15:dc:5f:24:94:85:af:33:76.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'dehadp05,192.168.18.205' (RSA) to the list of known hosts.

grid@dehadp05's password: 

[grid@dehadp01 ~]$ scp ~/.ssh/authorized_keys dehadp02:/home/grid/.ssh/authorized_keys

grid@dehadp02's password: 

authorized_keys                                                                                                                                  100% 1975     1.9KB/s   00:00    

[grid@dehadp01 ~]$ scp ~/.ssh/authorized_keys dehadp03:/home/grid/.ssh/authorized_keys

grid@dehadp03's password: 

authorized_keys                                                                                                                                  100% 1975     1.9KB/s   00:00    

[grid@dehadp01 ~]$ scp ~/.ssh/authorized_keys dehadp04:/home/grid/.ssh/authorized_keys

grid@dehadp04's password: 

authorized_keys                                                                                                                                  100% 1975     1.9KB/s   00:00    

[grid@dehadp01 ~]$ scp ~/.ssh/authorized_keys dehadp05:/home/grid/.ssh/authorized_keys

grid@dehadp05's password: 

authorized_keys                                                                                                                                  100% 1975     1.9KB/s   00:00 

 

验证ssh免密码登录

[grid@dehadp01 ~]$ ssh dehadp02 date

Wed Jul 22 15:28:07 CST 2015

[grid@dehadp01 ~]$ ssh dehadp03 date

Wed Jul 22 15:28:10 CST 2015

[grid@dehadp01 ~]$ ssh dehadp04 date

Wed Jul 22 15:28:12 CST 2015

[grid@dehadp01 ~]$ ssh dehadp05 date

Wed Jul 22 15:28:15 CST 2015

 

6) 安装jdk,配置环境变量

[root@dehadp01 ~]# tar zxvf jdk.17.tar.gz -C /opt/

。。。

[root@dehadp01 ~]# mv /opt/jdk1.7.0_71/ /opt/jdk1.7

[root@dehadp01 ~]# su - grid

[grid@dehadp01 ~]$ vi .bash_profile 

export JAVA_HOME=/opt/jdk1.7

export HADOOP_HOME=/home/grid/hadoop

export HBASE_HOME=/home/grid/hbase

export HIVE_HOME=/home/grid/hive

export SQOOP_HOME=/home/grid/sqoop

export FLUME_HOME=/home/grid/flume

 

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin:$SQOOP_HOME/bin:$FLUME_HOME/bin

 

4.hadoop安装配置

4.1安装zookeeper集群(在dehadp03上)

-[grid@dehadp03 ~]$ tar zxvf zookeeper-3.4.6.tar.gz

。。。

[grid@dehadp03 ~]$ mv zookeeper-3.4.6 zookeeper

 

4.1.2修改配置

[grid@dehadp03 ~]$ cp zookeeper/conf/zoo_sample.cfg zookeeper/conf/zoo.cfg

[grid@dehadp03 ~]$ vi zookeeper/conf/zoo.cfg

 

dataDir=/home/grid/zookeeper/data

 

server.1=dehadp03:2888:3888

server.2=dehadp04:2888:3888

server.3=dehadp05:2888:3888

 

4.1.3创建id文件

[grid@dehadp03 ~]$ mkdir -p /home/grid/zookeeper/data

[grid@dehadp03 ~]$ echo 1 > /home/grid/zookeeper/data/myid

 

4.1.4将配置好的zookeeper复制到其他节点(dehadp04dehadp05

[grid@dehadp03 ~]$ scp -r zookeeper grid@dehadp04:/home/grid

。。。

[grid@dehadp03 ~]$ scp -r zookeeper grid@dehadp05:/home/grid

。。。

 

4.1.5分别修改dehadp04dehadp05上的id文件

[grid@dehadp04 ~]$ echo 2 > /home/grid/zookeeper/data/myid

 

[grid@dehadp05 ~]$ echo 3 > /home/grid/zookeeper/data/myid

 

4.1.6在各个节点上分别启动zookooper

[grid@dehadp03 ~]$ cd zookeeper/bin/

[grid@dehadp03 bin]$ ./zkServer.sh start

JMX enabled by default

Using config: /home/grid/zookeeper/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

 

4.2安装配置hadoop集群

4.2.1解压

[grid@dehadp01 ~]$ tar zxvf hadoop-2.6.0.tar.gz 

。。。

[grid@dehadp01 ~]$ mv hadoop-2.6.0 hadoop

 

因为网上下载的hadoop2.x-bin系列的都是32位的,64位的需要自行编译。所以还要把32位的二进制包替换

hadoop-native-64-2.6.0.tar分享:
http://pan.baidu.com/s/1kTGgWpL

[grid@dehadp01 ~]$ tar xvf hadoop-native-64-2.6.0.tar -C hadoop/lib/native/

./

./libhadoop.a

./libhadoop.so

./libhadoop.so.1.0.0

./libhadooppipes.a

./libhadooputils.a

./libhdfs.a

./libhdfs.so

./libhdfs.so.0.0.0

 

4.2.2修改hadoop-env.sh

[grid@dehadp01 ~]$ vi hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/opt/jdk1.7

 

4.2.3修改core-site.xml

[grid@dehadp01 ~]$ vi hadoop/etc/hadoop/core-site.xml 

<configuration>

<!-- 指定hdfsnameservicemasters -->

<property>

<name>fs.defaultFS</name>

<value>hdfs://masters</value>

</property>

<!-- 指定hadoop临时目录 -->

<property>

<name>hadoop.tmp.dir</name>

<value>/home/grid/hadoop/data/tmp</value>

</property>

<!-- 指定zookeeper地址 -->

<property>

<name>ha.zookeeper.quorum</name>

<value>dehadp03:2181,dehadp04:2181,dehadp05:2181</value>

</property>

</configuration>

 

4.2.4修改hdfs-stie.xml

[grid@dehadp01 ~]$ vi hadoop/etc/hadoop/hdfs-site.xml 

<configuration>

<!--指定hdfsnameservicemasters,需要和core-site.xml中的保持一致 -->

<property>

<name>dfs.nameservices</name>

<value>masters</value>

</property>

<!-- masters下面有两个NameNode,分别是dehadp01dehadp02 -->

<property>

<name>dfs.ha.namenodes.masters</name>

<value>dehadp01,dehadp02</value>

</property>

<!-- dehadp01RPC通信地址 -->

<property>

<name>dfs.namenode.rpc-address.masters.dehadp01</name>

<value>dehadp01:9000</value>

</property>

<!-- dehadp01http通信地址 -->

<property>

<name>dfs.namenode.http-address.masters.dehadp01</name>

<value>dehadp01:50070</value>

</property>

<!-- dehadp02RPC通信地址 -->

<property>

<name>dfs.namenode.rpc-address.masters.dehadp02</name>

<value>dehadp02:9000</value>

</property>

<!-- dehadp02http通信地址 -->

<property>

<name>dfs.namenode.http-address.masters.dehadp02</name>

<value>dehadp02:50070</value>

</property>

<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://dehadp03:8485;dehadp04:8485;dehadp05:8485/masters</value>

</property>

<!-- 指定JournalNode在本地磁盘存放数据的位置 -->

<property>

<name>dfs.journalnode.edits.dir</name>

<value>/home/grid/hadoop/data/journal</value>

</property>

<!-- 开启NameNode失败自动切换 -->

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<!-- 配置失败自动切换实现方式 -->

<property>

<name>dfs.client.failover.proxy.provider.masters</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->

<property>

<name>dfs.ha.fencing.methods</name>

<value>

sshfence

shell(/bin/true)

</value>

</property>

<!-- 使用sshfence隔离机制时需要ssh免登陆 -->

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/grid/.ssh/id_rsa</value>

</property>

<!-- 配置sshfence隔离机制超时时间 -->

<property>

<name>dfs.ha.fencing.ssh.connect-timeout</name>

<value>30000</value>

</property>

</configuration>

 

4.2.5修改mapred-site.xml

[grid@dehadp01 ~]$ vi hadoop/etc/hadoop/mapred-site.xml

<configuration>

<!-- 指定mr框架为yarn方式 -->

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

 

4.2.6修改yarn-site.xml

[grid@dehadp01 ~]$ vi hadoop/etc/hadoop/yarn-site.xml

<configuration>

<!-- 开启RM高可靠 -->

<property>

<name>yarn.resourcemanager.ha.enabled</name>

<value>true</value>

</property>

<!-- 指定RMcluster id -->

<property>

<name>yarn.resourcemanager.cluster-id</name>

<value>RM_HA_ID</value>

</property>

<!-- 指定RM的名字 -->

<property>

<name>yarn.resourcemanager.ha.rm-ids</name>

<value>rm1,rm2</value>

</property>

<!-- 分别指定RM的地址 -->

<property>

<name>yarn.resourcemanager.hostname.rm1</name>

<value>dehadp01</value>

</property>

<property>

<name>yarn.resourcemanager.hostname.rm2</name>

<value>dehadp02</value>

</property>

<property>

<name>yarn.resourcemanager.recovery.enabled</name>

<value>true</value>

</property>

<property>

<name>yarn.resourcemanager.store.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

</property>

<!-- 指定zk集群地址 -->

<property>

<name>yarn.resourcemanager.zk-address</name>

<value>dehadp03:2181,dehadp04:2181,dehadp05:2181</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

 

4.2.7修改slaves

[grid@dehadp01 ~]$ vi hadoop/etc/hadoop/slaves

dehadp03

dehadp04

dehadp05

 

4.2.8将配置好的hadoop复制到其他节点

[grid@dehadp01 ~]$ scp -r hadoop grid@dehadp02:/home/grid

。。。

[grid@dehadp01 ~]$ scp -r hadoop grid@dehadp03:/home/grid

。。。

[grid@dehadp01 ~]$ scp -r hadoop grid@dehadp04:/home/grid

。。。

[grid@dehadp01 ~]$ scp -r hadoop grid@dehadp05:/home/grid

。。。

 

4.3启动格式化启动hadoop集群

4.3.1启动journalnode(分别在dehadp03dehadp04dehadp05上执行 )

[grid@dehadp03 ~]$ hadoop-daemon.sh start journalnode

starting journalnode, logging to /home/grid/hadoop/logs/hadoop-grid-journalnode-dehadp03.out

 

4.3.2格式化hdfs

[grid@dehadp01 ~]$ hdfs namenode -format

。。。

 

将dehadp01上生成的data文件夹复制到dehadp02的相同目录下

[grid@dehadp01 ~]$ scp -r hadoop/data/ grid@dehadp02:/home/grid/hadoop

VERSION                                                                                                                                          100%  207     0.2KB/s   00:00    

fsimage_0000000000000000000.md5                                                                                                                  100%   62     0.1KB/s   00:00    

fsimage_0000000000000000000                                                                                                                      100%  351     0.3KB/s   00:00    

seen_txid                                                                                                                                        100%    2     0.0KB/s   00:00  

 

4.3.3格式化zk(在dehadp01上执行)

[grid@dehadp01 ~]$ hdfs zkfc -formatZK

。。。

 

4.3.4启动hdfs

[grid@dehadp01 ~]$ start-dfs.sh 

Starting namenodes on [dehadp02 dehadp01]

The authenticity of host 'dehadp01 (192.168.18.201)' can't be established.

RSA key fingerprint is 1c:16:df:b0:13:11:47:15:dc:5f:24:94:85:af:33:76.

Are you sure you want to continue connecting (yes/no)? dehadp02: starting namenode, logging to /home/grid/hadoop/logs/hadoop-grid-namenode-dehadp02.out

yes

dehadp01: Warning: Permanently added 'dehadp01,192.168.18.201' (RSA) to the list of known hosts.

dehadp01: starting namenode, logging to /home/grid/hadoop/logs/hadoop-grid-namenode-dehadp01.out

dehadp04: starting datanode, logging to /home/grid/hadoop/logs/hadoop-grid-datanode-dehadp04.out

dehadp05: starting datanode, logging to /home/grid/hadoop/logs/hadoop-grid-datanode-dehadp05.out

dehadp03: starting datanode, logging to /home/grid/hadoop/logs/hadoop-grid-datanode-dehadp03.out

Starting journal nodes [dehadp03 dehadp04 dehadp05]

dehadp04: journalnode running as process 22254. Stop it first.

dehadp05: journalnode running as process 19209. Stop it first.

dehadp03: journalnode running as process 15052. Stop it first.

Starting ZK Failover Controllers on NN hosts [dehadp02 dehadp01]

dehadp01: starting zkfc, logging to /home/grid/hadoop/logs/hadoop-grid-zkfc-dehadp01.out

dehadp02: starting zkfc, logging to /home/grid/hadoop/logs/hadoop-grid-zkfc-dehadp02.out

 

4.3.5启动yarn

[grid@dehadp01 ~]$ start-yarn.sh 

starting yarn daemons

starting resourcemanager, logging to /home/grid/hadoop/logs/yarn-grid-resourcemanager-dehadp01.out

dehadp04: starting nodemanager, logging to /home/grid/hadoop/logs/yarn-grid-nodemanager-dehadp04.out

dehadp05: starting nodemanager, logging to /home/grid/hadoop/logs/yarn-grid-nodemanager-dehadp05.out

dehadp03: starting nodemanager, logging to /home/grid/hadoop/logs/yarn-grid-nodemanager-dehadp03.out

 

注意:dehadp02上的standby resourcemanger是需要手动启动的

[grid@dehadp02 ~]$ yarn-daemon.sh start resourcemanager

starting resourcemanager, logging to /home/grid/hadoop/logs/yarn-grid-resourcemanager-dehadp02.out

 

4.4查看hadoop集群状态

4.4.1通过web查看集群状态

查看namenode

http://dehadp01:50070/

http://dehadp02:50070/

查看resourcemanger

http://dehadp01:8088/

http://dehadp02:8088/

 

4.4.2通过hdfs命名查看集群状态

[grid@dehadp01 ~]$ hdfs dfsadmin -report

Configured Capacity: 112779337728 (105.03 GB)

Present Capacity: 101695242240 (94.71 GB)

DFS Remaining: 101695168512 (94.71 GB)

DFS Used: 73728 (72 KB)

DFS Used%: 0.00%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0

 

-------------------------------------------------

Live datanodes (3):

 

Name: 192.168.18.204:50010 (dehadp04)

Hostname: dehadp04

Decommission Status : Normal

Configured Capacity: 37593112576 (35.01 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 3665035264 (3.41 GB)

DFS Remaining: 33928052736 (31.60 GB)

DFS Used%: 0.00%

DFS Remaining%: 90.25%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Thu Jul 23 14:46:54 CST 2015

 

 

Name: 192.168.18.205:50010 (dehadp05)

Hostname: dehadp05

Decommission Status : Normal

Configured Capacity: 37593112576 (35.01 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 3560112128 (3.32 GB)

DFS Remaining: 34032975872 (31.70 GB)

DFS Used%: 0.00%

DFS Remaining%: 90.53%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Thu Jul 23 14:46:54 CST 2015

 

 

Name: 192.168.18.203:50010 (dehadp03)

Hostname: dehadp03

Decommission Status : Normal

Configured Capacity: 37593112576 (35.01 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 3858948096 (3.59 GB)

DFS Remaining: 33734139904 (31.42 GB)

DFS Used%: 0.00%

DFS Remaining%: 89.73%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Thu Jul 23 14:46:51 CST 2015

 

4.5试用hadoop

4.5.1 hadoop日常操作

[grid@dehadp01 ~]$ hadoop fs

Usage: hadoop fs [generic options]

[-appendToFile <localsrc> ... <dst>]

[-cat [-ignoreCrc] <src> ...]

[-checksum <src> ...]

[-chgrp [-R] GROUP PATH...]

[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]

[-chown [-R] [OWNER][:[GROUP]] PATH...]

[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]

[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]

[-count [-q] [-h] <path> ...]

[-cp [-f] [-p | -p[topax]] <src> ... <dst>]

[-createSnapshot <snapshotDir> [<snapshotName>]]

[-deleteSnapshot <snapshotDir> <snapshotName>]

[-df [-h] [<path> ...]]

[-du [-s] [-h] <path> ...]

[-expunge]

[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]

[-getfacl [-R] <path>]

[-getfattr [-R] {-n name | -d} [-e en] <path>]

[-getmerge [-nl] <src> <localdst>]

[-help [cmd ...]]

[-ls [-d] [-h] [-R] [<path> ...]]

[-mkdir [-p] <path> ...]

[-moveFromLocal <localsrc> ... <dst>]

[-moveToLocal <src> <localdst>]

[-mv <src> ... <dst>]

[-put [-f] [-p] [-l] <localsrc> ... <dst>]

[-renameSnapshot <snapshotDir> <oldName> <newName>]

[-rm [-f] [-r|-R] [-skipTrash] <src> ...]

[-rmdir [--ignore-fail-on-non-empty] <dir> ...]

[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]

[-setfattr {-n name [-v value] | -x name} <path>]

[-setrep [-R] [-w] <rep> <path> ...]

[-stat [format] <path> ...]

[-tail [-f] <file>]

[-test -[defsz] <path>]

[-text [-ignoreCrc] <src> ...]

[-touchz <path> ...]

[-usage [cmd ...]]

 

Generic options supported are

-conf <configuration file>     specify an application configuration file

-D <property=value>            use value for given property

-fs <local|namenode:port>      specify a namenode

-jt <local|resourcemanager:port>    specify a ResourceManager

-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster

-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.

-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

 

The general command line syntax is

bin/hadoop command [genericOptions] [commandOptions]

 

将一个日志文件传入hadoop集群:

[grid@dehadp01 ~]$ hadoop fs -mkdir -p /home/grid/txt

[grid@dehadp01 ~]$ hadoop fs -put hadoop/logs/hadoop-grid-namenode-dehadp01.log /home/grid/txt

 

查看刚传入的文件:

[grid@dehadp01 ~]$ hadoop fs -ls /home/grid/txt

Found 1 items

-rw-r--r--   3 grid supergroup      39803 2015-07-23 14:52 /home/grid/txt/hadoop-grid-namenode-dehadp01.log

 

4.5.2运行mapreuce样例wordcount,统计文件内每个单词出现的次数

[grid@dehadp01 ~]$ hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /home/grid/txt/hadoop-grid-namenode-dehadp01.log /home/grid/txt/woutcount_out

。。。

 

查看运行的结果:

[grid@dehadp01 ~]$ hadoop fs -cat /home/grid/txt/woutcount_out/part-r-00000

。。。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/20777547/viewspace-1745820/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/20777547/viewspace-1745820/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值