Invalid byte 2 of 2-byte UTF-8 sequence. Nested exception: Invalid byte 2 of 2-b

本文探讨了在接口解析过程中遇到的XML文件编码问题,并提供了两种解决方案:修改XML文件编码为GBK或GB2312,或下载文件后在本地处理编码。文章附带了解决方案的代码实现,帮助开发者正确处理不同编码的XML文件。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在做接口解析时候出现的错误:[color=red]Invalid byte 2 of 2-byte UTF-8 sequence. Nested exception: Invalid byte 2 of 2-byte UTF-8 sequence.[/color]

很明显是在读取XML文件时候出现的编码问题!
在测试过程中发现,主要原因是xml文件中声明的编码与xml文件本身保存时的编码不一致。

现在解决的办法就有几个,主要说我测试过的两个方。
如果你是直接以文件的形式读取 可以更改XML文件中的 UTF-8编码 改为 GBK或GB2312 .
还有一种可能是 你直接以URL 通过网络地址获取InputStream流形式读取 在转换成Document对象。这种方法的解决办法是先down 下来保存在本地。实现比较简单 用个OutputStream流写到你想保存的目录即可。再解析down下来的文件 其中在 SAXReader saxReader = new SAXReader();
之后Document document=sax.read(new File(file));之前 [color=green]处理XML文件编码格式[/color]即可调用下面的处理方法。
方法:

/**
* 上传文件编码判断
* */
public static String get_charset(File file) {
String charset = "GBK";
byte[] first3Bytes = new byte[3];
try {
boolean checked = false;
;
BufferedInputStream bis = new BufferedInputStream(
new FileInputStream(file));
bis.mark(0);
int read = bis.read(first3Bytes, 0, 3);
if (read == -1)
return charset;
if (first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE) {
charset = "UTF-16LE";
checked = true;
} else if (first3Bytes[0] == (byte) 0xFE
&& first3Bytes[1] == (byte) 0xFF) {
charset = "UTF-16BE";
checked = true;
} else if (first3Bytes[0] == (byte) 0xEF
&& first3Bytes[1] == (byte) 0xBB
&& first3Bytes[2] == (byte) 0xBF) {
charset = "UTF-8";
checked = true;
}
bis.reset();
if (!checked) {
// int len = 0;
int loc = 0;

while ((read = bis.read()) != -1) {
loc++;
if (read >= 0xF0)
break;
if (0x80 <= read && read <= 0xBF) // 单独出现BF以下的,也算是GBK
break;
if (0xC0 <= read && read <= 0xDF) {
read = bis.read();
if (0x80 <= read && read <= 0xBF) // 双字节 (0xC0 - 0xDF)
// (0x80
// - 0xBF),也可能在GB编码内
continue;
else
break;
} else if (0xE0 <= read && read <= 0xEF) {// 也有可能出错,但是几率较小
read = bis.read();
if (0x80 <= read && read <= 0xBF) {
read = bis.read();
if (0x80 <= read && read <= 0xBF) {
charset = "UTF-8";
break;
} else
break;
} else
break;
}
}

}

bis.close();
} catch (Exception e) {
e.printStackTrace();
}

return charset;
}




/**
*down 的简单方法 保存到本地自己指定
*/
public static void writeFile(String strUrl, String filePath, String fileName) {
try {
URL url = new URL(strUrl);
InputStream is = url.openStream();
File f = new File(filePath);
f.mkdirs();

OutputStream os = new FileOutputStream(filePath + fileName);

int bytesRead = 0;
byte[] buffer = new byte[8192];

while ((bytesRead = is.read(buffer, 0, 8192)) != -1) {
os.write(buffer, 0, bytesRead);
}
} catch (Exception e2) {
e2.printStackTrace();
}
}




/**
* 转换流编码类型方法
* */
private static byte[] InputStreamToByte(InputStream is) throws IOException {
ByteArrayOutputStream byteArrOut = new ByteArrayOutputStream();
byte[] temp = new byte[1024];
int len = 0;
while ((len = is.read(temp, 0, 1024)) != -1) {
byteArrOut.write(temp, 0, len);
}
byteArrOut.flush();
byte[] bytes = byteArrOut.toByteArray();
return bytes;
}

InputStream is :可以是流inputStream对象 也可以是file路径 自己转换!

在测试类里面
就可以把下面这种方式改成下面那种 (可能你不是这种方式做到):

SAXReader sax = new SAXReader();// 获得dom4j的文档对象
Document document=sax.read(new File(file));
Element element=document.getRootElement();
System.out.println(element.getName());


SAXReader saxReader = new SAXReader();
//下面转格式代码
[color=red]byte[] bytes = InputStreamToByte(new FileInputStream(file));
InputStream in = new ByteArrayInputStream(bytes);
InputStreamReader strInStream = new InputStreamReader(in,"GBK");[/color]
Document root = saxReader.read(strInStream);
Element element = root.getRootElement();
System.out.println(element.getName());



这样就可以正常输出了。
最重要的是 【转换流编码类型方法】比网上的一些解决办法来得简单多了!
/usr/local/include/ceres/internal/integer_sequence_algorithm.h:135:59: note: expected a type, got ‘N’ /usr/local/include/ceres/internal/integer_sequence_algorithm.h:146:39: error: ‘integer_sequence’ is not a member of ‘std’ 146 | struct ExclusiveScanImpl<T, Sum, std::integer_sequence<T>, SeqOut> { | ^~~~~~~~~~~~~~~~ /usr/local/include/ceres/internal/integer_sequence_algorithm.h:146:39: error: ‘integer_sequence’ is not a member of ‘std’ /usr/local/include/ceres/internal/integer_sequence_algorithm.h:146:57: error: wrong number of template arguments (3, should be 4) 146 | struct ExclusiveScanImpl<T, Sum, std::integer_sequence<T>, SeqOut> { | ^ /usr/local/include/ceres/internal/integer_sequence_algorithm.h:130:8: note: provided for ‘template<class T, T Sum, class SeqIn, class SeqOut> struct ceres::internal::ExclusiveScanImpl’ 130 | struct ExclusiveScanImpl; | ^~~~~~~~~~~~~~~~~ /usr/local/include/ceres/internal/integer_sequence_algorithm.h:160:53: error: ‘integer_sequence’ is not a member of ‘std’ 160 | typename ExclusiveScanImpl<T, T(0), Seq, std::integer_sequence<T>>::Type; | ^~~~~~~~~~~~~~~~ /usr/local/include/ceres/internal/integer_sequence_algorithm.h:160:53: error: ‘integer_sequence’ is not a member of ‘std’ /usr/local/include/ceres/internal/integer_sequence_algorithm.h:160:70: error: template argument 4 is invalid 160 | typename ExclusiveScanImpl<T, T(0), Seq, std::integer_sequence<T>>::Type; | ^ /usr/local/include/ceres/internal/integer_sequence_algorithm.h:160:16: error: expected nested-name-specifier 160 | typename ExclusiveScanImpl<T, T(0), Seq, std::integer_sequence<T>>::Type; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr
最新发布
03-25
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值