hadoop 完全分布式

最新推荐文章于 2025-04-05 23:35:52 发布

努力就是魅力

最新推荐文章于 2025-04-05 23:35:52 发布

阅读量385

点赞数

分类专栏： hadoop

本文链接：https://blog.youkuaiyun.com/nulijiushimeili/article/details/79310037

版权

hadoop 专栏收录该内容

18 篇文章

订阅专栏

PV page view, 浏览量
--每打开一次网页,记录一次
UV unique vistor, 独立访客数
--一天内访问某站点的人数(以 cookie | session 为依据)
IP Internet Protocol
--访问数(以IP为依据)
VV Vistor View 访客的访问次数
--同一访客在一天内的访问次数
跳出数
打开网站什么都不干,就退出了.

***************************************************************************************************

完全分布式

1.硬件的规划,配置尽量一样
2.部署步骤:
每台机器上的文件目录保持一致
解压hadoop压缩包,删除share下面的doc
配置jdk
为下面三个文件制定jdk
hadoop-env.sh
mapred-env.sh
yarn-env.sh
修改下面4个文件,可以复制单节点的配置
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://bigdata-senior01.ibeifeng.com:8020</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/opt/app/hadoop-2.5.0/data/tmp</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>nulijiushimeili</value>
</property>

</configuration>

hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>bigdata-senior01.ibeifeng.com:50090</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>bigdata-senior01.ibeifeng.com:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>bigdata-senior01.ibeifeng.com:19888</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata-senior02.ibeifeng.com</value>
//*********特别注意yarn配置的位置,配置在那一台上,就在那一台上面启动yarn
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>100000</value>
</property>
</configuration>
修改 slave 文件中的hostname(有几台机器就配几台)
bigdata-senior01.ibeifeng.com
bigdata-senior02.ibeifeng.com
bigdata-senior03.ibeifeng.com
3.将配置好的机器上的配置文件分发到其他机器上(在分发之前一点也好先删除doc文件)
copy之前一点要检查,more /etc/hosts 文件的网络映射,每台机器都要检查
顺便把hadoop-2.5.0/lib/native 的本地库替换掉
命令: scp -r hadoop-2.5.0/ bigdata-senior02.ibeifeng.com:/opt/app/
命令: scp -r hadoop-2.5.0/ bigdata-senior03.ibeifeng.com:/opt/app/
4.格式化namenode(format namenode)(/opt/app/hadoop-2.5.0/data/tmp 必须要有这个目录)
namenode只有一台机器上有,所以只格式化一台主节点机器
bin/hdfs namenode -format
5.bigdata-senior01.ibeifeng.com
sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start datanode
bigdata-senior02.ibeifeng.com
sbin/hadoop-daemon.sh start datanode
bigdata-senior03.ibeifeng.com
sbin/hadoop-daemon.sh start datanode
6.查看datanode的节点数
http://192.168.120.100:50070/dfshealth.html#tab-datanode
7.测试集群环境
//创建目录
bin/hdfs dfs -mkdir -p /test/input
//上传文件
bin/hdfs dfs -put /opt/datas/input.txt /test/input/
//测试单词统计
bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /test/input/ /test/output/
//查看结果
bin/hdfs dfs -text /test/output
//一次性关闭所有机器上的主从节点
sbin/stop-dfs.sh
//在第二台机器上关闭yarn,yarn配置在哪一台机器上就在哪一台机上关
sbin/stop-yarn.sh
//关闭historyserver
sbin/mr-jobhistory-daemon.sh stop historyserver
8.每台机器都要配置ssh免密码登录
每台机器都要执行,
ssh-keygen -t rsa
ssh-copy-id 主机名
如果发现不生效,全部删了重来