最近我们的平台环境需要进行升级,这里我们的源程序是MAHOUT0.9版本的(直接改的源代码),而计算平台的环境之前是1.2.1的,目前要升级至2.6.0,因此直接将Mahout程序放在上面是执行不了的。
1.首先修改mahout父类maven工程的依赖的HADOOP jar包
在mahout父类maven工程中,将pom.xml的<hadoop.version>1.2.1</hadoop.version>修改为<hadoop.version>2.6.0</hadoop.version>
去掉hadoop-core这个依赖替换成hadoop-hdfs
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
一个完整的hadoop程序依赖应该包括:hadoop-core(1.x版本)或者hadoop-hdfs(2.x版本),hadoop-common,hadoop-mapreduce-client-core,>hadoop-mapreduce-client-common.
替换之后会发现报错,错误信息是jdk mising这样的信息,这个时候只要加入jdk依赖即可
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.6</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
2.修改其子工程mahout-core,将pom.xml的
<profiles>
<profile>
<id>hadoop-0.20</id>
<activation>
<property>
<name>!hadoop.version</name>
</property>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
</dependency>
</dependencies>
</profile>
<profile>
<id>hadoop-0.23</id>
<activation>
<property>
<name>hadoop.version</name>
</property>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
</dependency>
</dependencies>
</profile>
</profiles>
修改为
<profiles>
<profile>
<id>hadoop-0.20</id>
<activation>
<property>
<name>!hadoop.version</name>
</property>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
</dependency>
</dependencies>
</profile>
<profile>
<id>hadoop-0.23</id>
<activation>
<property>
<name>hadoop.version</name>
</property>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
</dependency>
</dependencies>
</profile>
</profiles>
目的就是为了替换依赖jar,为什么要这么改呢,当然我们也可以重新建一个profile,这个主要是上因为mahout工程默认的用的是这个profile,如下图
所以我们最好是建立自己的profile,然后把对应的依赖加进去,我为了省事直接改的0.20的。最后我们在编译mahout源码时候,也是可以指定profile的,最后终于测试通过