针对Leetcode中的一个题目151、反转字符串中的单词,如果不使用trim()处理可能产生空字符串结果,针对这个查看了一下split源码并提出自己的看法
split()函数源码
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
* (1) one-char String and this character is not one of the
* RegEx's meta characters ".$|()[{^?*+\\", or
* (2) two-char String and the first char is the backslash and
* the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.length() == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
int last = length();
list.add(substring(off, last));
off = last;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, length()));
// Construct result
int resultSize = list.size();
if (limit == 0) {
while (resultSize > 0 && list.get(resultSize - 1).isEmpty()) {
resultSize--;
}
}
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
首先针对快速路径查询,这里指的是regex较为简单的情况,他给定的是长度为1且不是正则符号或者长度为2,第一个字符是反斜杠 \,第二个字符不是 ASCII 数字、ASCII 字母,且不是代理字符。
if (((regex.length() == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
如果上述判断为true(说明匹配规则可以按照字符匹配),那么可以使用indexOf进行匹配,程序会把regedx的值赋给ch,并通过indexOf和substring截取字符串。
此处的off是遍历的开始地址,next是通过substring截取到的匹配regex的字符所在的位置,limit则是分割限制参数,如果是0的话则是无上限。(所有!limited代表无切割限制),list是返回结果
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
以下是切割字符串具体过程,通过indexOf,从off开始找到ch匹配,如果值不为-1,则将[off,next)处字符串截取添加,后续off更新为next+1继续遍历。
但是这会导致一个情况:如果string的off处对于的字符就等于ch,那么substring返回的就是[off,off),会返回空字符串加入list
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
//如果off==next substring会返回 "";
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
int last = length();
list.add(substring(off, last));
off = last;
break;
}
}
同时因为!limited代表为无上限限制,因此最后对list的结尾处的空字符串进行删除处理
// Construct result
//注意,在limit==0(没有提取数量上限的时候),会把最后结果中末尾的空字符串减去
int resultSize = list.size();
if (limit == 0) {
while (resultSize > 0 && list.get(resultSize - 1).isEmpty()) {
resultSize--;
}
}
String[] result = new String[resultSize];
//剪去空字符串
return list.subList(0, resultSize).toArray(result);
例如对于下面代码(代码中a的数量均为11)
String a = "aaaaaaaaaaabc";
String[] res = a.split("a");
最后res中的结果是
#11个空字符串
["","","","","","","","","","","","bc"]
对于下面的代码
String a = "baaaaaaaaaaac";
String[] res = a.split("a");
最后的结果是
#10个空字符串,因为第一个a被定位的时候输出的字符串是b,然后next更新到下一个a
["b","","","","","","","","","","",“c”]
对于下面的代码
String a = "bcaaaaaaaaaaa";
String[] res = a.split("a");
最后的结果是
#0个空字符串(均在末尾被处理掉)
["bc"]