The Files
You need 3 files to run the maxTemperature example:
- a C++ file containing the map and reduce functions,
- a data file containing some temperature data such as found at the National Climatic Data Cener (NCDC), and
- a Makefile to compile the C++ file.
Max_temperature.cpp
#include <algorithm>
#include <limits>
#include <string>
#include "stdint.h" // <-- this is missing from the book
#include "hadoop/Pipes.hh"
#include "hadoop/TemplateFactory.hh"
#include "hadoop/StringUtils.hh"
using namespace std;
class MaxTemperatureMapper : public HadoopPipes::Mapper {
public:
MaxTemperatureMapper(HadoopPipes::TaskContext& context) {
}
void map(HadoopPipes::MapContext& context) {
string line = context.getInputValue();
string year = line.substr(15, 4);
string airTemperature = line.substr(87, 5);
string q = line.substr(92, 1);
if (airTemperature != "+9999" &&
(q == "0" || q == "1" || q == "4" || q == "5" || q == "9")) {
context.emit(year, airTemperature);
}
}
};
class MapTemperatureReducer : public HadoopPipes::Reducer {
public:
MapTemperatureReducer(HadoopPipes::TaskContext& context) {
}
void reduce(HadoopPipes::ReduceContext& context) {
int maxValue = -10000;
while (context.nextValue()) {
maxValue = max(maxValue, HadoopUtils::toInt(context.getInputValue()));
}
context.emit(context.getInputKey(), HadoopUtils::toString(maxValue));
}
};
int main(int argc, char *argv[]) {
return HadoopPipes::runTask(HadoopPipes::TemplateFactory<MaxTemperatureMapper,
MapTemperatureReducer>());
}Makefile
Create a Make file with the following entries. Note that you need to figure out if your computer hosts a 32-bit processor or a 64-bit processor, and pick the right library. To find this out, run the following command:
uname -a
To which the OS responds:
Linux hadoop6 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 05:23:09 UTC 2010 i686 GNU/Linux
The i686 indicates a 32-bit machine, for which you need to use the Linux-i386-32 library. Anything with 64 indicates the other type, for which you use the Linux-amd64-64 library.
CC = g++ HADOOP_INSTALL = /home/hadoop/hadoop PLATFORM = Linux-i386-32 CPPFLAGS = -m32 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include max_temperature: max_temperature.cpp $(CC) $(CPPFLAGS) $< -Wall -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes \ -lhadooputils -lpthread -g -O2 -o $@
Data File
- Create a file called sample.txt which will contain sample temperature data from the NCDC.
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
- Put the data file in HDFS:
hadoop dfs -mkdir ncdc hadoop dfs -put sample.txt ncdc
Compiling and Running
- You need a C++ compiler. GNU g++ is probably the best choice. Check that it is installed (by typing g++ at the prompt). If it is not installed yet, install it!
sudo apt-get install g++
- Compile the code:
make max_temperature
- and fix any errors you're getting.
- Copy the executable file (max_temperature) to a bin directory in HDFS:
hadoop dfs -mkdir bin hadoop dfs -put max_temperature bin/max_temperature
- Run the program!
hadoop pipes -D hadoop.pipes.java.recordreader=true \
-D hadoop.pipes.java.recordwriter=true \
-input ncdc/sample.txt -output ncdc-out \
-program bin/max_temperature
- Verify that you have gotten the right output:
hadoop dfs -text ncdc-out/part-00000 1949 111 1950 22
本文介绍了如何使用C++和Hadoop处理温度数据,通过创建包含map和reduce函数的文件、提供数据文件以及使用Makefile编译代码,实现温度数据的最大值计算。
6万+

被折叠的 条评论
为什么被折叠?



