项目中需要将HBase中的数据读取出来,偶然发现Phoenix有支持Spark的模块,于是试了一下,依赖如下:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.3</version>
</dependency>
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-spark</artifactId>
<version>4.7.0-HBase-1.1</version>
</dependency>
结果编译时抛出异常
error: bad symbolic reference. A signature in package.class refers to term sql
in package org.apache.spark which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling package.class.
map.getOrDefault(prefix + "has_canceled",BigDecimal.ZERO).asInstanceOf[BigDecimal]
由于最后一行的内容我初步判断是由于scala的版本冲突造成的,但是将在将scala引入pom依赖并且将电脑上的sdk从项目中剔除后却还是出现这种问题,后来注意到
sql in package org.apache.spark
这行内容,我明明没有使用SparkSQL的api为什么会出现它的内容呢?这是因为在phoenix-spark模块中有一个org.apache.phoenix.spark的包对象
package org.apache.phoenix
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, SQLContext}
package object spark {
implicit def toProductRDDFunctions[A <: Product](rdd: RDD[A]): ProductRDDFunctions[A] = {
new ProductRDDFunctions[A](rdd)
}
implicit def toSparkContextFunctions(sc: SparkContext): SparkContextFunctions = {
new SparkContextFunctions(sc)
}
implicit def toSparkSqlContextFunctions(sqlContext: SQLContext): SparkSqlContextFunctions = {
new SparkSqlContextFunctions(sqlContext)
}
implicit def toDataFrameFunctions(data: DataFrame): DataFrameFunctions = {
new DataFrameFunctions(data)
}
}
对象中导入了SparkSQL的SqlContext且进行了隐式转换,而在项目中导入了org.apache.phoenix.spark._,而恰恰是这个_,使得项目中将会把SQLContext和DataFrame两个类纳入编译范围,而我的项目中又没有导入SparkSQL的依赖,所以抛出了异常。
解决方法也很简单,在pom中加入SparkSQL的依赖即可(不能单纯的不导入org.apache.phoenix.spark下和SparkSQL相关的隐式转换,因为底层还需要用到SparkSQL的DataType类)
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.3</version>
</dependency>
Phoenix-Spark的具体使用方法可以在官网上找到http://phoenix.apache.org/phoenix_spark.html