Java Integer.parseInt(String s) throws NumberFormatException : 文本处理Unicode 65279的问题-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_42761904/article/details/107208702

问题描述

最近在处理一个文本文件，需要把文本开头的第一个字符串取出来转成 int类型（eg：字符串“52”转成52），用了Java中的Integer.parseInt(String s)，但是报了错 NumberFormatException

解决方案

对字符串截取子字符串s=s.substring(1);，再进行parseInt即可

问题原因

parseInt(String s)部分代码如下：

	 if (s == null) {
         throw new NumberFormatException("null");
     }

     if (radix < Character.MIN_RADIX) {
         throw new NumberFormatException("radix " + radix +
                                         " less than Character.MIN_RADIX");
     }

     if (radix > Character.MAX_RADIX) {
         throw new NumberFormatException("radix " + radix +
                                         " greater than Character.MAX_RADIX");
     }

     int result = 0;
     boolean negative = false;
     int i = 0, len = s.length();
     int limit = -Integer.MAX_VALUE;
     int multmin;
     int digit;

     if (len > 0) {
         char firstChar = s.charAt(0);
         if (firstChar < '0') { // Possible leading "+" or "-"
             if (firstChar == '-') {
                 negative = true;
                 limit = Integer.MIN_VALUE;
             } else if (firstChar != '+')
                 throw NumberFormatException.forInputString(s);

             if (len == 1) // Cannot have lone "+" or "-"
                 throw NumberFormatException.forInputString(s);
             i++;
         }
         multmin = limit / radix;
         while (i < len) {
             // Accumulating negatively avoids surprises near MAX_VALUE
             digit = Character.digit(s.charAt(i++),radix);
             if (digit < 0) {
                 throw NumberFormatException.forInputString(s);
             }
             if (result < multmin) {
                 throw NumberFormatException.forInputString(s);
             }
             result *= radix;
             if (result < limit + digit) {
                 throw NumberFormatException.forInputString(s);
             }
             result -= digit;
         }
     } else {
         throw NumberFormatException.forInputString(s);
     }
     return negative ? result : -result;

可以看到会抛出 NumberFormatException 的情况有以下几种：

传入了空字符串或字符串长度小于0
传入的进制radix不在规定范围内（没传就默认是10）
传入字符串中有非法字符
字符串中第一个字符如果是非数值，也不属于正负号

于是四种情况逐个排查，前三类问题都不存在，只好试试第四类，因为说的是String.charAt(0)非法时就会抛出此异常，于是将此字符串按索引一位一位输出了，出现了这个情况：

这个字符串的第一位字符，表面看字符是空白，但code值却是65279，而且我想要的字符串“52”的长度竟然是3而不是2，就是因为多了第一位这个空白，终于找到原因·

去了解了一下这个字符，这个字符总体的作用就是：位于文本最前面，用来标记此文本是utf8编码。
注：1. unicode为65279的字符叫 “ZERO WIDTH NO-BREAK SPACE”，即没有宽度的空格符，本质上也是null，但是不同于null。也被叫做BOM（Byte Order Mark，字节顺序标记）。
2. BOM是用来标记字节流是大/小端的，utf8不需要BOM来表明字节顺序，但可以用BOM来表明编码方式
3. win的记事本程序在打开文本内容后会自动添加BOM