转于自己在公司的Blog:
[url]http://pt.alibaba-inc.com/wp/experience_929/hessian-big-string-serialize-problems.html[/url]
网站出现比较奇怪的现象,线上总有些Offer信息反序化时出错,而测试环境却没有出现过,
通过远程调试线上环境,发现Hessian3.2.1在处理0x33标记时,会出错,跟进去发现:
Hessian3.2.1在处理大String时,以32k为一个块,最后一个不满32k的块分三种情况处理:
[b](1) 块大小为1到31个byte时:[/b]
用一个byte表示长度(一个byte最多表示31的长度),后面跟具体数据。
[b](2) 块大小为32到1023个byte时:[/b]
用两个byte表示长度,后面跟具体数据,因只需要高位byte的4个bit位,加低位byte就能够表示1023的长度,为了不浪费,高位byte的另外4个bit位被压缩用于flag。
[b](3) 块大小为1023到32k-1个byte时:[/b]
用’s’前缀标识为小块,进行块读取。
问题出在第二种情况,Hessian2Input没有还原压缩的4个bit位。
[b](1) 异常信息:[/b] (出错位置上的串已改为xxx表示)
[code]expected string at 0x33 java.lang.String (xxx)
at com.caucho.hessian.io.Hessian2Input.error(Hessian2Input.java:2714)
at com.caucho.hessian.io.Hessian2Input.expect(Hessian2Input.java:2685)
at com.caucho.hessian.io.Hessian2Input.parseChar(Hessian2Input.java:2442)
at com.caucho.hessian.io.Hessian2Input.readString(Hessian2Input.java:1285)
at com.caucho.hessian.io.JavaDeserializer$StringFieldDeserializer.deserialize(JavaDeserializer.java:580)
... 21 more
at com.caucho.hessian.io.JavaDeserializer.logDeserializeError(JavaDeserializer.java:671)
at com.caucho.hessian.io.JavaDeserializer$StringFieldDeserializer.deserialize(JavaDeserializer.java:584)
at com.caucho.hessian.io.JavaDeserializer.readObject(JavaDeserializer.java:233)
at com.caucho.hessian.io.JavaDeserializer.readObject(JavaDeserializer.java:157)
at com.caucho.hessian.io.Hessian2Input.readObjectInstance(Hessian2Input.java:2067)
at com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:1592)
at com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:1576)
... 14 more[/code]
[b](2) 问题代码:[/b] (Hessian2Input)
[code]private int parseChar() throws IOException
{
while (_chunkLength <= 0) {
if (_isLastChunk)
return -1;
int code = _offset < _length ? (_buffer[_offset++] & 0xff) : read();
switch (code) {
case BC_STRING_CHUNK:
_isLastChunk = false;
_chunkLength = (read() << 8 ) + read();
break;
case 'S':
_isLastChunk = true;
_chunkLength = (read() << 8 ) + read();
break;
case 0x00: case 0x01: case 0x02: case 0x03:
case 0x04: case 0x05: case 0x06: case 0x07:
case 0x08: case 0x09: case 0x0a: case 0x0b:
case 0x0c: case 0x0d: case 0x0e: case 0x0f:
case 0x10: case 0x11: case 0x12: case 0x13:
case 0x14: case 0x15: case 0x16: case 0x17:
case 0x18: case 0x19: case 0x1a: case 0x1b:
case 0x1c: case 0x1d: case 0x1e: case 0x1f:
_isLastChunk = true;
_chunkLength = code - 0x00;
break;
// 问题所在,没有处理结尾块在1F到1K范围内时压缩的4个bit位
// 下面四行是新增的修复代码:
case 0x30: case 0x31: case 0x32: case 0x33:
_isLastChunk = true;
_chunkLength = ((code - 0x30) << 8 ) + read();
break;
default:
throw expect("string", code);
}
}
_chunkLength--;
return parseUTF8Char();
}[/code]
[b](3) 测试代码:[/b]
[code]public static void main(String[] args) throws IOException {
test(1024 * 32); // OK
test(1024 * 32 + 1); // OK
test(1024 * 32 + 31); // OK
test(1024 * 32 + 32); // ERROR
test(1024 * 32 + 512); // ERROR
test(1024 * 32 + 1023); // ERROR
test(1024 * 33); // OK
}
public static void test(int size) throws IOException {
SerializerFactory reponseSerializerFactory = new SerializerFactory();
StringBuilder buf = new StringBuilder();
for (int i = 0; i < size; i ++) {
buf.append('A');
}
String str = buf.toString();
System.out.println("length: " + str.getBytes().length);
ByteArrayOutputStream byteBuffer = new ByteArrayOutputStream(2048);
Hessian2Output hessianOutput = new Hessian2Output(byteBuffer);
hessianOutput.setSerializerFactory(reponseSerializerFactory);
hessianOutput.writeObject(str);
hessianOutput.flush();
byte[] bytes = byteBuffer.toByteArray();
ByteArrayInputStream input = new ByteArrayInputStream(bytes);
Hessian2Input hessianInput = new Hessian2Input(input);
hessianInput.setSerializerFactory(reponseSerializerFactory);
String result = (String)hessianInput.readObject(String.class);
System.out.println("result: " + result);
}[/code]
[url]http://pt.alibaba-inc.com/wp/experience_929/hessian-big-string-serialize-problems.html[/url]
网站出现比较奇怪的现象,线上总有些Offer信息反序化时出错,而测试环境却没有出现过,
通过远程调试线上环境,发现Hessian3.2.1在处理0x33标记时,会出错,跟进去发现:
Hessian3.2.1在处理大String时,以32k为一个块,最后一个不满32k的块分三种情况处理:
[b](1) 块大小为1到31个byte时:[/b]
用一个byte表示长度(一个byte最多表示31的长度),后面跟具体数据。
[b](2) 块大小为32到1023个byte时:[/b]
用两个byte表示长度,后面跟具体数据,因只需要高位byte的4个bit位,加低位byte就能够表示1023的长度,为了不浪费,高位byte的另外4个bit位被压缩用于flag。
[b](3) 块大小为1023到32k-1个byte时:[/b]
用’s’前缀标识为小块,进行块读取。
问题出在第二种情况,Hessian2Input没有还原压缩的4个bit位。
[b](1) 异常信息:[/b] (出错位置上的串已改为xxx表示)
[code]expected string at 0x33 java.lang.String (xxx)
at com.caucho.hessian.io.Hessian2Input.error(Hessian2Input.java:2714)
at com.caucho.hessian.io.Hessian2Input.expect(Hessian2Input.java:2685)
at com.caucho.hessian.io.Hessian2Input.parseChar(Hessian2Input.java:2442)
at com.caucho.hessian.io.Hessian2Input.readString(Hessian2Input.java:1285)
at com.caucho.hessian.io.JavaDeserializer$StringFieldDeserializer.deserialize(JavaDeserializer.java:580)
... 21 more
at com.caucho.hessian.io.JavaDeserializer.logDeserializeError(JavaDeserializer.java:671)
at com.caucho.hessian.io.JavaDeserializer$StringFieldDeserializer.deserialize(JavaDeserializer.java:584)
at com.caucho.hessian.io.JavaDeserializer.readObject(JavaDeserializer.java:233)
at com.caucho.hessian.io.JavaDeserializer.readObject(JavaDeserializer.java:157)
at com.caucho.hessian.io.Hessian2Input.readObjectInstance(Hessian2Input.java:2067)
at com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:1592)
at com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:1576)
... 14 more[/code]
[b](2) 问题代码:[/b] (Hessian2Input)
[code]private int parseChar() throws IOException
{
while (_chunkLength <= 0) {
if (_isLastChunk)
return -1;
int code = _offset < _length ? (_buffer[_offset++] & 0xff) : read();
switch (code) {
case BC_STRING_CHUNK:
_isLastChunk = false;
_chunkLength = (read() << 8 ) + read();
break;
case 'S':
_isLastChunk = true;
_chunkLength = (read() << 8 ) + read();
break;
case 0x00: case 0x01: case 0x02: case 0x03:
case 0x04: case 0x05: case 0x06: case 0x07:
case 0x08: case 0x09: case 0x0a: case 0x0b:
case 0x0c: case 0x0d: case 0x0e: case 0x0f:
case 0x10: case 0x11: case 0x12: case 0x13:
case 0x14: case 0x15: case 0x16: case 0x17:
case 0x18: case 0x19: case 0x1a: case 0x1b:
case 0x1c: case 0x1d: case 0x1e: case 0x1f:
_isLastChunk = true;
_chunkLength = code - 0x00;
break;
// 问题所在,没有处理结尾块在1F到1K范围内时压缩的4个bit位
// 下面四行是新增的修复代码:
case 0x30: case 0x31: case 0x32: case 0x33:
_isLastChunk = true;
_chunkLength = ((code - 0x30) << 8 ) + read();
break;
default:
throw expect("string", code);
}
}
_chunkLength--;
return parseUTF8Char();
}[/code]
[b](3) 测试代码:[/b]
[code]public static void main(String[] args) throws IOException {
test(1024 * 32); // OK
test(1024 * 32 + 1); // OK
test(1024 * 32 + 31); // OK
test(1024 * 32 + 32); // ERROR
test(1024 * 32 + 512); // ERROR
test(1024 * 32 + 1023); // ERROR
test(1024 * 33); // OK
}
public static void test(int size) throws IOException {
SerializerFactory reponseSerializerFactory = new SerializerFactory();
StringBuilder buf = new StringBuilder();
for (int i = 0; i < size; i ++) {
buf.append('A');
}
String str = buf.toString();
System.out.println("length: " + str.getBytes().length);
ByteArrayOutputStream byteBuffer = new ByteArrayOutputStream(2048);
Hessian2Output hessianOutput = new Hessian2Output(byteBuffer);
hessianOutput.setSerializerFactory(reponseSerializerFactory);
hessianOutput.writeObject(str);
hessianOutput.flush();
byte[] bytes = byteBuffer.toByteArray();
ByteArrayInputStream input = new ByteArrayInputStream(bytes);
Hessian2Input hessianInput = new Hessian2Input(input);
hessianInput.setSerializerFactory(reponseSerializerFactory);
String result = (String)hessianInput.readObject(String.class);
System.out.println("result: " + result);
}[/code]