方法一
/**
* 判断文件的编码格式
*
* @return 文件编码格式
*/
public static String getFileCode(InputStream in) {
try {
int p = (in.read() << 8) + in.read();
String code = "GBK";
switch (p) {
case 59524:
code = "UTF-8";
break;
case 0xfffe:
code = "Unicode";
break;
case 0xfeff:
code = "UTF-16BE";
break;
case 48581:
code = "GBK";
break;
default:
}
return code;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
通过引用大佬的文章并且修改 UTF-8 的值得到上面的代码,仅支持少部分编码格式,
https://my.oschina.net/lipengxs
方法二
使用juniversalchardet
提供的方法,很全面。推荐使用
POM
<dependency>
<groupId>com.github.albfernandez</groupId>
<artifactId>juniversalchardet</artifactId>
<version>2.4.0</version>
</dependency>
code
private static String getFileCharacter(String fileName) {
byte[] buf = new byte[4096];
try (InputStream inputStream = Files.newInputStream(Paths.get(fileName))) {
UniversalDetector detector = new UniversalDetector();
int nread;
while ((nread = inputStream.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
detector.dataEnd();
String encoding = detector.getDetectedCharset();
detector.reset();
if (Objects.isNull(encoding)) {
throw new IllegalArgumentException("No encoding detected.");
}
return encoding;
} catch (IOException e) {
throw new RuntimeException(e);
}
}
test
public static void main(String[] args) {
String fileName = "C:\\Users\\SERVER\\Downloads\\A.csv";
String fileCharacter = getFileCharacter(fileName);
System.out.println("fileCharacter = " + fileCharacter);
}
## output
fileCharacter = GB18030
关于GB18030、GB2312、GBK的区别 请参考
https://zhuanlan.zhihu.com/p/453675608