content = new String(DatatypeConverter.parseBase64Binary(content))
最近需要写一个web service API提供给前台调用,用以上传文件内容,API如下:
public void upload(String requestor, Script script, String content, boolean overwrite)
API所有的参数按json编码。最开始的需求是上传文本文件,比如脚本文件,所以content参数encoding定为UTF-8。但后来需要上传jar包这种二进制文件,用UTF-8就不行了,有一些编码会被json识别成非法字符,参考stackoverflow上的回答:
The problem with UTF-8 is that it is not the most space efficient encoding. Also, some random binary byte sequences are invalid UTF-8 encoding. So you can't just interpret a random binary byte sequence as some UTF-8 data because
it will be invalid UTF-8 encoding. The benefit of this constrain on the UTF-8 encoding is that it makes it robust and possible to locate multi byte chars start and end whatever byte we start looking at.
As a consequence, if encoding a byte value in the range [0..127] would need only one byte in UTF-8 encoding, encoding a byte value in the range [128..255] would require 2 bytes ! Worse than that. In JSON, control chars, " and \ are not allowed to appear in
a string. So the binary data would require some transformation to be properly encoded.
解决办法是采用Base64编码替换UTF-8。Java 6内置Base64编解码的支持:
content = new String(DatatypeConverter.parseBase64Binary(content));
DatatypeConverter.printBase64Binary(scriptfao.read(script).getBytes());