Tigo从TFS读取文件时,发现有乱码,现象很奇怪
1、在测试环境,时而正常,时而乱码
2、在线上环境,永远是乱码。最终发现原因是采用Httpclient访问TFS时,用了
String html = org.apache.http.util.EntityUtils.toString(entity, defaultCharset)
EntityUtils.toString的具体实现如下:
public static String toString(
final HttpEntity entity, final Charset defaultCharset) throws IOException, ParseException {
……
ContentType contentType = ContentType.getOrDefault(entity);
Charset charset = contentType.getCharset();
if (charset == null) {
charset = defaultCharset;
}
if (charset == null) {
charset = HTTP.DEF_CONTENT_CHARSET;
}
Reader reader = new InputStreamReader(instream, charset);
……
}
也就是说EntityUtils中真正使用的编码是不一定是default,而有可能是CharsetContentType.getOrDefault().getCharset(),该值由TFS服务器编码决定。那么解决方法也很简单,直接弃用EntityUtils.toString,使用以下方法:BufferedReader reader = new BufferedReader(new InputStreamReader(entity, defaultCharset));
StringBuilder sb = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
sb.append(line).append("\r\n");
}
String html = sb.toString();