Hadoop The Definitive Guide 4th Editon
===================
I.[Hadoop Fundamentals]
---------------
1.Meet Hadoop
2.MapReduce
—————————
Hadoop provides its own set of basic types that are optimized for network serialization
Hadoop divides the input to a MapReduce job into fixed-size pieces called input splits, or just splits. Hadoop creates one map task for each split, which runs the user-defined map function for each record in the split.
To minimize the data transferred between map and reduce tasks. Hadoop allows the user to specify a combiner function to be run on the map output, and the combiner function’s output forms the input to the reduce function.
Sample 问题:
1.脚本获取NCDC数据
2.hadoop获取
调用load_ncds.sh 获取NCDC数据 时 出现PipeMapRed.waitOutputThreads(): subprocess failed with code 127,错误码 参考 :http://blog.youkuaiyun.com/oDaiLiDong/article/details/46803603
增加对was的支持,由于找不到aws的 accesskey ,此方法暂未解决。
For some reason, the jar hadoop-aws-[version].jar which contains the implementation to NativeS3FileSystem is not present in the classpath of hadoop by default in the version 2.6 & 2.7. So, try and add it to the classpath by adding the following line in hadoop-env.sh which is located in $HADOOP_HOME/etc/hadoop/ hadoop-env.sh:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*
Assuming you are using Apache Hadoop 2.6 or 2.7