Messy code issue

Source: -homepage--social--weibo#N101F9”>https://www.ibm.com/developerworks/cn/java/analysis-and-summary-of-common-random-code-problems/index.html?cm_mmc=dwchina--homepage--social--weibo#N101F9

Reason

  1. Encode

  2. Decode

  3. Lack of a font library

Analysis phenomenon

  1. Caused by encoding

    In English Windows, u create a txt, type and save “你好”. Then u will see “??” after u open it.

    • Reason:
      Windows uses ANSI encode by default, and locale of Ewin is English, which mapping codepage 437 as the encode way is ISO-8859-1. This cause all chinese symbols will be mapping “3F3F” as encode result. And 3F reach “?”.

    • Solution:
      No decode way could display that right characters. So we should choose the right encode way when we save double byte character doc such as GB2312 or UTF-8 as simple chinese while BIG5 or UTF-8 in complex chinese. For chinese user, changing the locale to Chinese also a good idea.

  2. Caused by decoding

    Create a txt with “你好”, and copy it to Ewin. Then open it and get the error.

    • Reason:
      Cwin create txt used ANSI as GB2312, and after copy it to Ewin, notepad will use ISO-8859-1 as decode way.

    • Solution:
      Select the right decode method.

  3. Caused by application function.

    Open the uedit32.exe(cn version) and get the messy code.

    • Reason: Windows will use Unicode if the application support Unicode or use the ANSI(Which means as the country decided standard encode method)

    • Solution: Edit the Regional and language options: set the standard and format and non-Unicode as simple chinese. Then the system will decode use ANSI.

  4. Caused by lack of font

    Open file and get square symbol.

    • Reason: From binary byte sequence to code point, then to character which is found from font library. Then show as lattice on the screen. If not fonud, then use square to replace it.

    • Solution: Setup the library.

Think in coding

1.
I/O operation: read is decode(byte->character) while write is encode(character->byte)

  1. Here is the java I/O interface:

    java i/o interface

    When we use Writer and FileOutputStream:

    File I/O Stream

  2. String.getBytes.

    String.getBytes(): Encodes this String into a sequence of bytes using the platform’s default charset(Charset.defaultCharset(), which is decided by system attribute file.encoding), storing the result into a new byte array.

    Note: if use do not set the jvm’s file.encoding, it will depend on the environment which start the JVM: If cmd, then use regional language while eclipse could set this attribute.

List[1]. String.getBytes() display messy code

public static void main(String[] args) {
    private static final String fileName = "c:\\log.txt" ;
    String str ="你好,中国";
    writeError(str);
}

private static void writeError(String a_error) {
    try {
        File logFile = new File(fileName);
        //创建字节流对象
        FileOutputStream outPutStream = new FileOutputStream(logFile, true);
        //使用平台的默认字符集将此字符串编码为一系列字节
        outPutStream.write(a_error.getBytes(), 0, a_error.length() );
        outPutStream.flush();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

List[2].outputStreamWrite to set character library

private static void writeErrorWithCharSet(String a_error) {
    try { 
        File logFile = new File(FileName);
        String charsetName = "utf-8";
        //指定字符字节转换时使用的字符集为 Unicode,编码方式为 utf-8 
        Writer m_write = new BufferedWriter(
        new OutputStreamWriter(new java.io.FileOutputStream(logFile, true),
        charsetName) );
        m_write.write(a_error);
        m_write.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

To avoid messy code issue, when call the I/O api, u had better to use the overload format with pointing library args.

Web Application

Web messy code

Reason:

  1. Browser not followed the URI encode standard. Server not config the encode and decode. Devloper’s error.

  2. GET method: encode the non-ASCII character by urlencode.

    域名:端口/contextPath/servletPath/pathInfo?queryString PathInfo and queryString will depend on the server. Tomcat always set them on the server.xml, pathInfo part decode character library is defined on the connector’s , and queryString was by useBodyEncodingForURI(if not set, tomcat will use UTF-8:version >= 8.0)

    To avoid the encode which we do not want, we had better use ASCII only(or urlencode first) on the url.

  3. Post method: Browser will check the contentType(“text/html;charset=utf-8”) then encode form by using it.

    <%@ page language="java" contentType="text/html; charset="GB18030" pageEncoding="UTF-8"%> pageEncoding is how to save the jsp file.

list[3] POST request set setContentType

protected void doPost(HttpServletRequest request, HttpServletResponse
response) throws ServletException, IOException {
    if(!ServletFileUpload.isMultipartContent(request)){
        throw new ServletException("Content type is not multipart/form-data");
    }
    response.setCharacterEncoding("UTF-8");//设置响应编码 
    response.setContentType("text/html;charset=UTF-8");
    PrintWriter out = response.getWriter();
    out.write("<html><head></head><body>");
    try { 
        List<FileItem> items = (List<FileItem>)
        uploader.parseRequest(request);
        …
}

JSP, use post method to do request

<%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

<title>index</title>

<meta http-equiv="pragma" content="no-cache">

<meta http-equiv="cache-control" content="no-cache">

<meta http-equiv="expires" content="0">

</head>

<body>

<form action="FileUploadServlet" method="post" enctype="multipart/form-data">

选择上传文件:<input type="file" name="fileName">

<br>

<input type="submit" value="上传">

</form>

</body>

</html>
- Browser display: Chrome use jsp contentType and charset while firefox use text encoding.

- For jsp(html): jsp will saved as pageEncoding, if not ponit it, then use charset, if not charset, then as default ISO-8859-1. Charset reponse for notify the browser how to decode web page.

- For dynamic: Server use HttpServletResponse.setContentType to set http header's contentType.

File name be messy code when downloading

Reason: Header only support ASCII library, and encode other character to 3F(?)

Solution: urlEncode.encode(filename, charset) at first, then put it on the header.

list[4]

protected void doGet(HttpServletRequest request, HttpServletResponse
response) throws ServletException, IOException {
    String fileName = getDecodeParameter(request,"fileName");
    String userName = getDecodeParameter(request, "username");
    response.setHeader("Content-Disposition", "attachment; filename=\"" +
    URLEncoder.encode(fileName,"utf-8") + "\";userName=\"" +
    URLEncoder.encode(userName,"utf-8") + "\"");
}

DataBase operation

database messy code

Bridge: Unicode

Server database, client system, client environment varible.

Create databse using utf-8, and SQL NCHAR could solve the multi-language issues.

References:

Deep in analyzing the web request

Referring RFC

Deep in analyzing java cnEncode

Unicode Encode standard

Messy GA算法是一种基于遗传算法(Genetic Algorithm)的优化算法。基于遗传算法原理的优化算法是通过模拟生物进化过程来寻找最优解的方法。而Messy GA算法则是对传统遗传算法的一种改进。 Messy GA算法的核心思想是引入不确定性变量,也就是“杂成性”。杂成性是指在染色体的每个位置上可能存在多个基因的情况。这样一来,一个个体的基因序列就不再是固定的,而是可以从多个可能的基因中选择。 Messy GA算法之所以引入杂成性,是为了增加搜索空间的多样性,提高算法的全局搜索能力。杂成性让每个个体都具有更多的选择空间,在交叉和突变操作时,可以选择更多的基因组合,以期得到更好的解。 然而,Messy GA算法也存在一些问题。首先,由于增加了不确定性,个体的基因串变得更长,从而导致搜索空间的维度增加。这会使得算法的收敛速度变慢,搜索效率下降。此外,由于杂成性使得个体的基因表达不再唯一,使得个体之间的比较和选择变得困难。 为了解决这些问题,Messy GA算法还可以与其他优化算法相结合,如模拟退火算法、粒子群优化算法等,以增强搜索的效果。另外,通过适当的参数设置和策略调整,也可以在一定程度上改善算法的性能。 总之,Messy GA算法在遗传算法的基础上引入了杂成性,以增加搜索空间的多样性,提高全局搜索能力。虽然算法存在一些问题,但通过与其他优化算法的结合和参数调整,可以优化算法的性能,提高搜索效率。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值