关于大数据学习的最好的网站就是官网http://www.apache.org/
免责声明:很多资料都是网上一步步搜集到的,感谢各位前辈默默无闻的奉献与付出,资料过多,不一一感谢,如果侵权,请及时联系作者本人或者投诉至平台,我会第一时间删除,纯分享。
hadoop2.8.1 伪分布式环境搭建
在02的基础上,(参见https://mp.youkuaiyun.com/postedit/90669594)继续搭建mapreduce和yarn ,
官方参考资料全部来源在
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation
1. cd /opt/moudle/hadoop2.8.1
2. vi etc/hadoop/mapred-site.xml
添加内容
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property
3. vi etc/hadoop/mapred-site.xml
添加内容
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
4.启动yarn 资源调度器
start-yarn.sh
5. jps
查看java进程
3687 Jps
3335 ResourceManager
2153 NameNode
2281 DataNode
2443 SecondaryNameNode
3438 NodeManager
mapreduce 是一个计算框架,
6.尝试使用官方的例子测试一下
参考博客连接:
https://blog.youkuaiyun.com/u012343297/article/details/79978526
采用系统默认的wordcount 运行程序
hadoop jar /opt/module/hadoop-2.8.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.1.jar wordcount /1.txt /output
1.txt 文件里面存放了数据 例如aaaa f f s d ad a das da da d ad a d 这种数据
/output 必须是一个不存在的目录,否则
执行过程:
19/05/30 00:29:12 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/05/30 00:29:13 INFO input.FileInputFormat: Total input files to process : 1
19/05/30 00:29:13 INFO mapreduce.JobSubmitter: number of splits:1
19/05/30 00:29:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1559144078198_0001
19/05/30 00:29:14 INFO impl.YarnClientImpl: Submitted application application_1559144078198_0001
19/05/30 00:29:14 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1559144078198_0001/
19/05/30 00:29:14 INFO mapreduce.Job: Running job: job_1559144078198_0001
19/05/30 00:29:24 INFO mapreduce.Job: Job job_1559144078198_0001 running in uber mode : false
19/05/30 00:29:24 INFO mapreduce.Job: map 0% reduce 0%
19/05/30 00:29:32 INFO mapreduce.Job: map 100% reduce 0%
19/05/30 00:29:38 INFO mapreduce.Job: map 100% reduce 100%
19/05/30 00:29:39 INFO mapreduce.Job: Job job_1559144078198_0001 completed successfully
19/05/30 00:29:39 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=55
FILE: Number of bytes written=272557
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=122
HDFS: Number of bytes written=25
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4760
Total time spent by all reduces in occupied slots (ms)=3669
Total time spent by all map tasks (ms)=4760
Total time spent by all reduce tasks (ms)=3669
Total vcore-milliseconds taken by all map tasks=4760
Total vcore-milliseconds taken by all reduce tasks=3669
Total megabyte-milliseconds taken by all map tasks=4874240
Total megabyte-milliseconds taken by all reduce tasks=3757056
Map-Reduce Framework
Map input records=15
Map output records=14
Map output bytes=85
Map output materialized bytes=55
Input split bytes=92
Combine input records=14
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=55
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=216
CPU time spent (ms)=1230
Physical memory (bytes) snapshot=426430464
Virtual memory (bytes) snapshot=4178669568
Total committed heap usage (bytes)=280494080
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=30
File Output Format Counters
Bytes Written=25
查看结果
[root@hadoop opt]# hdfs dfs -lsr /
lsr: DEPRECATED: Please use 'ls -R' instead.
-rw-r--r-- 1 root supergroup 30 2019-05-30 00:25 /1.txt
drwxr-xr-x - root supergroup 0 2019-05-30 00:29 /output
-rw-r--r-- 1 root supergroup 0 2019-05-30 00:29 /output/_SUCCESS
-rw-r--r-- 1 root supergroup 25 2019-05-30 00:29 /output/part-r-00000
drwx------ - root supergroup 0 2019-05-30 00:29 /tmp
drwx------ - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn
drwx------ - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn/staging
drwxr-xr-x - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn/staging/history
drwxrwxrwt - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxrwx--- - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn/staging/history/done_intermediate/root
-rwxrwx--- 1 root supergroup 33561 2019-05-30 00:29 /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1559144078198_0001-1559147353861-root-word+count-1559147377370-1-1-SUCCEEDED-default-1559147363243.jhist
-rwxrwx--- 1 root supergroup 347 2019-05-30 00:29 /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1559144078198_0001.summary
-rwxrwx--- 1 root supergroup 134620 2019-05-30 00:29 /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1559144078198_0001_conf.xml
drwx------ - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn/staging/root
drwx------ - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn/staging/root/.staging
查看运行结果
[root@hadoop opt]# hdfs dfs -cat /output/part-r-00000
a 3
b 3
c 3
d 2
qa 1
v 2
报错