因打包配置错误导致设置FsUrlStreamHandlerFactory无效的问题
hdfs://前缀的url在Java中默认是无法识别的,所以需要添加
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
才能识别。
但是如果由于打包设置不正确则会导致本地运行正常而打成jar包后仍然无法识别的问题。
问题现象
Exception in thread "main" java.net.MalformedURLException: unknown protocol: hdfs
at java.net.URL.<init>(URL.java:592)
at java.net.URL.<init>(URL.java:482)
at java.net.URL.<init>(URL.java:431)
at in.ksharma.hdfs.FileReader.main(FileReader.java:29)
或者
Exception in thread "main" java.io.IOException: No FileSystem for scheme: file
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.mahout.classifier.naivebayes.NaiveBayesModel.materialize(NaiveBayesModel.java:100)
google半天之后发现几个解决办法
方案一
hadoop-2.X/share/hadoop/hdfs/hadoop-hdfs-2.X.jar to your classpath.
参考:https://stackoverflow.com/questions/25971333/malformedurlexception-on-reading-file-from-hdfs
配置后无效
方案二
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
需要在maven中配置打包插件,设置
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
由于工程用的是gradle的shadow插件打包
故而查询shadow官网
Merging Service Descriptor Files
Java libraries often contain service descriptors files in the META-INF/services directory of the JAR. A service descriptor typically contains a line delimited list of classes that are supported for a particular service. At runtime, this file is read and used to configure library or application behavior.
Multiple dependencies may use the same service descriptor file name. In this case, it is generally desired to merge the content of each instance of the file into a single output file. The ServiceFileTransformer class is used to perform this merging. By default, it will merge each copy of a file under META-INF/services into a single file in the output JAR.
// Merging Service Files
shadowJar {
mergeServiceFiles()
}
添加如上配置即等于maven中的
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
参考链接:
https://www.codelast.com/原创-解决读写hdfs文件的错误:no-filesystem-for-scheme-hdfs/
https://imperceptiblethoughts.com/shadow/configuration/merging/#merging-service-descriptor-files
结论
重新打包后问题解决。正常运行。