java读取UTF-8文本文件第一个字符多出一个问号解决方法

本文介绍了一个Java工具类,用于自动检测文件的编码格式(如UTF-8、UTF-16等),并提供了使用示例。通过该工具类可以简化文件读取过程中的编码适配工作。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.创建工具类

import java.io.*;

public class UnicodeReader extends Reader {
  PushbackInputStream internalIn;
  InputStreamReader   internalIn2 = null;
  String              defaultEnc;

  private static final int BOM_SIZE = 4;

  
  UnicodeReader(InputStream in, String defaultEnc) {
     internalIn = new PushbackInputStream(in, BOM_SIZE);
     this.defaultEnc = defaultEnc;
  }

  public String getDefaultEncoding() {
     return defaultEnc;
  }

  
  public String getEncoding() {
     if (internalIn2 == null) return null;
     return internalIn2.getEncoding();
  }

  
  protected void init() throws IOException {
     if (internalIn2 != null) return;

     String encoding;
     byte bom[] = new byte[BOM_SIZE];
     int n, unread;
     n = internalIn.read(bom, 0, bom.length);

     if ( (bom[0] == (byte)0x00) && (bom[1] == (byte)0x00) &&
                 (bom[2] == (byte)0xFE) && (bom[3] == (byte)0xFF) ) {
        encoding = "UTF-32BE";
        unread = n - 4;
     } else if ( (bom[0] == (byte)0xFF) && (bom[1] == (byte)0xFE) &&
                 (bom[2] == (byte)0x00) && (bom[3] == (byte)0x00) ) {
        encoding = "UTF-32LE";
        unread = n - 4;
     } else if (  (bom[0] == (byte)0xEF) && (bom[1] == (byte)0xBB) &&
           (bom[2] == (byte)0xBF) ) {
        encoding = "UTF-8";
        unread = n - 3;
     } else if ( (bom[0] == (byte)0xFE) && (bom[1] == (byte)0xFF) ) {
        encoding = "UTF-16BE";
        unread = n - 2;
     } else if ( (bom[0] == (byte)0xFF) && (bom[1] == (byte)0xFE) ) {
        encoding = "UTF-16LE";
        unread = n - 2;
     } else {
        // Unicode BOM mark not found, unread all bytes
        encoding = defaultEnc;
        unread = n;
     }    
     //System.out.println("read=" + n + ", unread=" + unread);

     if (unread > 0) internalIn.unread(bom, (n - unread), unread);

     // Use given encoding
     if (encoding == null) {
        internalIn2 = new InputStreamReader(internalIn);
     } else {
        internalIn2 = new InputStreamReader(internalIn, encoding);
     }
  }

  public void close() throws IOException {
     init();
     internalIn2.close();
  }

  public int read(char[] cbuf, int off, int len) throws IOException {
     init();
     return internalIn2.read(cbuf, off, len);
  }

}

2.使用工具类读取文件

BufferedReader br = new BufferedReader(
     new UnicodeReader(
     new FileInputStream(sqlFile), 
     Charset.defaultCharset().name())); 



3.出现有问号的编写

  1. File  new File("./utf.txt");  
  2.         FileInputStream in new FileInputStream(f);  
  3.         // 指定读取文件时以UTF-8的格式读取  
  4.         BufferedReader br new BufferedReader(new InputStreamReader(in, "UTF-8"));  
  5.           
  6.         String line br.readLine();  
  7.         while(line != null 
  8.          
  9.             System.out.println(line);  
  10.             line br.readLine();  
  11.         }  

只需编写工具类,将new InputStreamReader(in, "UTF-8")替换成

new UnicodeReader(new FileInputStream(sqlFile),Charset.defaultCharset().name())就可以解决该问题。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值