(一)配置hadoop运行环境
i)因为hadoop是用java编写的,所以在装hadoop之前,首先需要安装java环境
1 安装jdk, 在http://www.oracle.com/technetwork/java/javase/downloads/index.html上下载最新的jdk安装
2 设置java运行环境,即在/etc/environment 或者 /etc/profile 或者 home目录即~/.bashrc 文件中设置JAVA_HOME、CLASSPATH、PATH环境变量
需要注意这三个文件的区别。
3 下载hadoop,在apache官网很容易下载hadoop的压缩包,然后可以新建一个hadoop用户和hadoop用户组,将hadoop解压到hadoop用户的home目录下面
4 修改hadoop运行环境变量,其实只要在hadoop的安装文件所在的目录下的conf目录中的hadoop-env.sh中加入export JAVA_HOME=你真实的JDK安装目录
ii)因为hadoop运行需要无命令的ssh,所以下面安装和配置ssh
sudo apt-get install ssh
sudo apt-get install rsync
ssh-keygen -t rsa -P ' ' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_key
ssh localhost
iii)配置hadoop伪分布模式
进入hadoop的安装目录,以hadoop身份
1、修改conf/core-site.xml为:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop/tmp</value>
</property>
</configuration>
2、修改conf/hdfs-site.xml为:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
3、修改conf/mapred-site.xml为:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
4、格式化hadoop的hdfs
hadoop namenode -format
(二)启动hadoop
hadoop@clebeg:~/hadoop$ bin/start-all.sh
Warning: $HADOOP_HOME is deprecated.
starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-clebeg.out
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-clebeg.out
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-clebeg.out
starting jobtracker, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-jobtracker-clebeg.out
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting tasktracker, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-tasktracker-clebeg.out
从上面的输出可以看到日志文件在哪里?
查看hadoop各个进程是否都已经启动完毕:
hadoop@clebeg:~/hadoop$ jps
5250 JobTracker
5407 TaskTracker
4816 NameNode
4988 DataNode
5594 Jps
5156 SecondaryNameNode
如果出现上面的所有进程,表示你已经安装好了hadoop
如果有问题,请查看日志,我的问题是datanode没有启动,原因是有一个文件的权限设置有问题,改过来,重新格式化hdfs即可