前言 :
Hadoop 发展前景 :
(1)分布式文件系统 HDFS (GFS) 。
(2)数据的计算 : 分布式计算。
1 MapReduce , 搜索排名
2 大任务 拆分成小任务
3 Map 阶段 进行任务拆分,Reduce 阶段进行数据计算汇总 。
(3)bigTable — Hbase (nosql), 行键 、列族。
启动 :start-all.sh
HDFS : 存储数据。
Yarn : Mapreduce 的运行容器。
访问 :
(1)命令行
(2)java api
(3) Web Console 管理界面
本地模式 :
特点:不具备 HDFS , 只能测试MapReduce 程序。
伪分布式模式 :
特点:具备Hadoop 所有功能,在单机上模拟一个分布式的环境。
(1)HDFS : 主 : NameNode , 数据节点 : DataNode
(2)Yarn : 容器 :运行MapReduce 程序 。
主节点 : ResourceManager 。
从节点 :NodeManager
MapReduce 的使用 :agui/hadoop/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.3.jar
[root@nn mapreduce]# hadoop jar hadoop-mapreduce-examples-2.8.3.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
[root@nn mapreduce]#
个人备忘
简单记录下操作过程,以及一些必要的配置和一些爬过的坑,希望对其他同学有帮助。