Mahout0.9的版本只支持hadoop1.x版本,现在hadoop2.x比1.x更灵活、功能更强大、应用更广,支持2.x那是必然的。在mahout的主干代码上已经是支持hadoop2.2的了,下面就编译源代码用于支持hadoop2.6,因为自己搭建的环境是这个版本。
5.now,you can do everything!
1、从github上clone一份源代码
2、修改pom.xml文件
找到hadoop.version一项,修改为2.6.0
修改profile中hadoop1为hadoop2
修改guava的版本,本来是16.0,将其修改为14.0
修改原因:hadoop中guava的版本是11.0.2,mahout中的是16.0,不兼容,运行程序是会报如下错误:
java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedMillis()J
3、执行 mvn clean install -Dhadoop2 -Dhadoop2.version=2.6.0 -DskipTests
4.在eclipse上创建maven工程,在pom.xml文件中添加如下依赖
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>Hadoop</groupId>
<artifactId>Mahout</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>Mahout</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-mrlegacy</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-integration</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-examples</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
</dependencies>
</project>
5.now,you can do everything!