java读文件判断编码格式

原创已于 2023-09-01 18:25:11 修改 · 2.8k 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#java

于 2023-08-25 19:02:43 首次发布

java 专栏收录该内容

30 篇文章

订阅专栏

这篇文章介绍了如何在Java中使用CharsetDetector库来检测字符串的编码格式，包括基本的示例代码和处理文件路径的情况。

Java 中可以使用 CharsetDetector 来判断字符串的编码格式。 CharsetDetector 是 Mozilla 开发的一个 Java 库，用于自动检测字符集的编码格式。

首先，需要将需要检测编码的字符串转换为 byte[] 数组，然后再使用 CharsetDetector 来判断编码格式。

以下是一个简单的示例代码：

        <!-- 文件编码识别工具 -->
        <dependency>
            <groupId>com.github.albfernandez</groupId>
            <artifactId>juniversalchardet</artifactId>
            <version>2.3.0</version>
        </dependency>

import org.mozilla.universalchardet.UniversalDetector;

public class CharsetDetectorDemo {
    public static void main(String[] args) {
        String testString = "这是一个测试字符串";
        byte[] testData = testString.getBytes();

        // 初始化 CharsetDetector
        UniversalDetector detector = new UniversalDetector(null);

        // 将数据填充到 CharsetDetector
        detector.handleData(testData, 0, testData.length);

        // 完成数据填充
        detector.dataEnd();

        // 获取检测出来的字符集
        String encoding = detector.getDetectedCharset();

        if (encoding != null) {
            System.out.println("编码格式为：" + encoding);
        } else {
            System.out.println("未能检测出编码格式。");
        }

        // 释放资源
        detector.reset();
    }
}

    public static String getCharset(String filePath) {
        String charset = "GBK";
        File file = new File(filePath);
        FileInputStream fis = null;
        try {
            fis = new FileInputStream(file);
            byte[] bytes = new byte[1024];
            int len;
            while ((len = fis.read(bytes)) != -1){
                 // 初始化 CharsetDetector
                 UniversalDetector detector = new UniversalDetector(null);
                 // 将数据填充到 CharsetDetector
                 detector.handleData(testData, 0, testData.length);
                 // 完成数据填充
                 detector.dataEnd();
                 // 获取检测出来的字符集
                 String encoding = detector.getDetectedCharset();
                 if (encoding  != null && "utf-8".equlesIgnoreCase(encoding)){
                      charset = "utf-8";
                      break;
                 }
                 // 释放资源
                 detector.reset();
            }
        } catch (Exception e){
        } finally {
             if (fis!=null){
                 try {
                 fis.close();
                 }catch(IOException e){
                 }
             }
        }




    }

java判断字符串编码格式-掘金