项目路径:https://github.com/jimingkang/offline_data_analystics.git
1下载hadoop-2.5.0-cdh5.3.6
解压到E:\hadoop-2.5.0-cdh5.3.6
2.在bin下添加winutils.exe
3添加修改了的NativeIO到src/main/extr文件下的包org.apache.hadoop.io.nativeio里面
4对于把数据从日志文件保存到hbase:
a)修改配置:
public void setConf(Configuration that) {
// TODO Auto-generated method stub
//本地运行,集群运行时注释掉
that.set("fs.defaultFS", "hdfs://192.168.199.198:9000");
that.set("hbase.zookeeper.quorum", "192.168.199.198:2181");
this.conf = HBaseConfiguration.create(that);
}
b)上传20151220.log到hdfs的/eventLogs/20151220下
c)上传ip解析文件到E:\\qqwry.dat
d)pom.xml中指定为本地模式:
<profile>
<!-- 指定本地运行环境,windows环境 -->
<id>local</id>
<activation>
<!-- 指定默认环境是local -->
<activeByDefault>true</activeByDefault>
</activation>
<build>
<plugins>
<plugin>
<!-- 支持多个不同文件夹中的java代码进行编译 -->
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<version>1.4</version>
<executions>
<execution>
<id>add-source</id>
<phase>generate-sources</phase>
<goals>
<goal>add-source</goal>
</goals>
<configuration>
<sources>
<source>${basedir}/src/main/java</source>
<source>${basedir}/src/main/extr</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
e)修改资源文件core-site.xml等里面的ip
f)运行ETLDriver