Java default io reader does not recognize all BOM markers. It it known to be fixed in JDK6, but I havent tested it yet. You can use UnicodeReader class to overcome problems and auto-recognize bom markers. It will give a transparent behaviour to underlying inputstreams.
Example code using UnicodeReader class
Here is an example method to read text file. It will recognize bom marker and skip it while reading.
public static char[] loadFile(String file) throws IOException {
// read text file, auto recognize bom marker or use
// system default if markers not found.
BufferedReader reader = null;
CharArrayWriter writer = null;
UnicodeReader r = new UnicodeReader(new FileInputStream(file), null);
char[] buffer = new char[16 * 1024]; // 16k buffer
int read;
try {
reader = new BufferedReader(r);
writer = new CharArrayWriter();
while( (read = reader.read(buffer)) != -1) {
writer.write(buffer, 0, read);
}
writer.flush();
return writer.toCharArray();
} catch (IOException ex) {
throw ex;
} finally {
try {
writer.close(); reader.close(); r.close();
} catch (Exception ex) { }
}
}
Example code to write UTF-8 with bom marker
Write bom marker bytes to start of empty file and all proper text editors have no problems using a correct charset while reading files. Java's OutputStreamWriter does not write utf8 bom marker bytes.
public static void saveFile(String file, String data, boolean append) throws IOException {
BufferedWriter bw = null;
OutputStreamWriter osw = null;
File f = new File(file);
FileOutputStream fos = new FileOutputStream(f, append);
try {
// write UTF8 BOM mark if file is empty
if (f.length() < 1) {
final byte[] bom = new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF };
fos.write(bom);
}
osw = new OutputStreamWriter(fos, "UTF-8");
bw = new BufferedWriter(osw);
if (data != null) bw.write(data);
} catch (IOException ex) {
throw ex;
} finally {
try { bw.close(); fos.close(); } catch (Exception ex) { }
}
}
Example code using UnicodeReader class
Here is an example method to read text file. It will recognize bom marker and skip it while reading.
public static char[] loadFile(String file) throws IOException {
// read text file, auto recognize bom marker or use
// system default if markers not found.
BufferedReader reader = null;
CharArrayWriter writer = null;
UnicodeReader r = new UnicodeReader(new FileInputStream(file), null);
char[] buffer = new char[16 * 1024]; // 16k buffer
int read;
try {
reader = new BufferedReader(r);
writer = new CharArrayWriter();
while( (read = reader.read(buffer)) != -1) {
writer.write(buffer, 0, read);
}
writer.flush();
return writer.toCharArray();
} catch (IOException ex) {
throw ex;
} finally {
try {
writer.close(); reader.close(); r.close();
} catch (Exception ex) { }
}
}
Example code to write UTF-8 with bom marker
Write bom marker bytes to start of empty file and all proper text editors have no problems using a correct charset while reading files. Java's OutputStreamWriter does not write utf8 bom marker bytes.
public static void saveFile(String file, String data, boolean append) throws IOException {
BufferedWriter bw = null;
OutputStreamWriter osw = null;
File f = new File(file);
FileOutputStream fos = new FileOutputStream(f, append);
try {
// write UTF8 BOM mark if file is empty
if (f.length() < 1) {
final byte[] bom = new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF };
fos.write(bom);
}
osw = new OutputStreamWriter(fos, "UTF-8");
bw = new BufferedWriter(osw);
if (data != null) bw.write(data);
} catch (IOException ex) {
throw ex;
} finally {
try { bw.close(); fos.close(); } catch (Exception ex) { }
}
}
本文介绍了一种使用Java处理Unicode文件的方法,通过UnicodeReader类自动识别并跳过BOM标记,确保了文件读写的透明性。同时,还提供了一个实用的例子来演示如何实现这一过程。
1527

被折叠的 条评论
为什么被折叠?



