在使用spark Mllib的时候,训练好的模型save之后,在线service需要load加载该模型,实现线上预测。
实际加载load的时候,抛出异常:Native snappy library not available: this version of libhadoop was built without snappy support
查了下,发现是因为Hadoop需要安装snappy支持,因此有以下两种解决办法:一是换一种方式,另一种是编译安装snappy支持:
-
One approach was to use a different hadoop codec like below
sc.hadoopConfiguration.set("mapreduce.output.fileoutputformat.compress", "true") sc.hadoopConfiguration.set("mapreduce.output.fileoutputformat.compress.type", CompressionType.BLOCK.toString) sc.hadoopConfiguration.set("mapreduce.output.fileoutputformat.compress.codec", "org.apache.hadoop.io.compress.BZip2Codec") sc.hadoopConfiguration.set("mapreduce.map.output.compress", "true") sc.hadoopConfiguration.set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.BZip2Codec") -
Second approach was to mention --driver-library-path
/usr/hdp/<whatever is your current version>/hadoop/lib/native/as a parameter to my spark-submit job (in command line)
本文探讨在使用Spark MLlib进行模型加载时遇到的Nativesnappylibrarynotavailable异常,提供两种解决方案:一是调整Hadoop配置以使用BZip2压缩;二是确保Spark作业包含Hadoop native库路径。
1823

被折叠的 条评论
为什么被折叠?



