hadoop pipes是hadoop的c++正式接口,通过socket与Map/Reduce框架通信,具体原理这里不在详述,下面通过一个单词统计的示例来说明用法。
1.代码
#include "hadoop/Pipes.hh"
#include "hadoop/TemplateFactory.hh"
#include "hadoop/StringUtils.hh"
const std::string WORDCOUNT = "WORDCOUNT";
const std::string INPUT_WORDS = "INPUT_WORDS";
const std::string OUTPUT_WORDS = "OUTPUT_WORDS";
class WordCountMap: public HadoopPipes::Mapper {
public:
};
class WordCountReduce: public HadoopPipes::Reducer {
public:
};
int main(int argc, char *argv[]) {
}
2.编译
makefile如下:
CC = g++
HADOOP_INSTALL = /home/keke/hadoop-0.20.2-cdh3u4
PLATFORM = Linux-i386-32
CPPFLAGS = -m32 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include
wordcount:wordcount.cpp
3.运行
先将只执行文件复制到HDFS上面,例如放在HDFS的bin下
执行:
hadoop pipes -D hadoop.pipes.java.recordreader=true -D
hadoop.pipes.java.recordwrite=true -input /user/keke/input -output
output -program bin/wordcount