scala中Split函数源码解析

本文详细介绍了字符串分割的方法,包括Scala和Python中的实现方式,并提供了具体的代码示例。深入探讨了Java中字符串类的split方法的内部实现过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

split用法如下:

scala> val s = "qw#ert#yu#u"
s: String = qw#ert#yu#u

scala> s.split("#")
res5: Array[String] = Array(qw, ert, yu, u)


查看源码可以看到,实现是在Java的String类中:

public String[] split(String regex) {
    return split(regex, 0);
}

public String[] split(String regex, int limit) {
    /* fastpath if the regex is a
     (1)one-char String and this character is not one of the
        RegEx's meta characters ".$|()[{^?*+\\", or
     (2)two-char String and the first char is the backslash and
        the second is not the ascii digit or ascii letter.
     */
    char ch = 0;
    if (((regex.value.length == 1 &&
         ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||  //如果regex只有一位并且不是这几个特殊字符
         (regex.length() == 2 &&  
          regex.charAt(0) == '\\' &&
          (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
          ((ch-'a')|('z'-ch)) < 0 &&
          ((ch-'A')|('Z'-ch)) < 0)) &&    //如果regex是两位,第一位是转义字符且第二位不是数字和字母
        (ch < Character.MIN_HIGH_SURROGATE ||  
         ch > Character.MAX_LOW_SURROGATE))  //不属于utf-16字符
    {
        int off = 0;
        int next = 0;
        boolean limited = limit > 0;
        ArrayList<String> list = new ArrayList<>();
        while ((next = indexOf(ch, off)) != -1) {
            if (!limited || list.size() < limit - 1) {
                list.add(substring(off, next));
                off = next + 1;
            } else {    // last one
                //assert (list.size() == limit - 1);
                list.add(substring(off, value.length));
                off = value.length;
                break;
            }
        }
        // If no match was found, return this
        if (off == 0)
            return new String[]{this};

        // Add remaining segment
        if (!limited || list.size() < limit)
            list.add(substring(off, value.length));

        // Construct result
        int resultSize = list.size();
        if (limit == 0) {
            while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                resultSize--;
            }
        }    //去掉长度为0的字符即“”
        String[] result = new String[resultSize];
        return list.subList(0, resultSize).toArray(result);
    }
    return Pattern.compile(regex).split(this, limit);
}
用python简单实现一下:

def split(target,regex):
    off = 0
    theNext = 0
    res = []
    while regex in target[off:]:
        theNext = target.index(regex,off)
        res.append(target[off:theNext])
        off = theNext+1
    res.append(target[off:])
    return res

print(split("qq#ee#rr#t","#"))
> ['qq', 'ee', 'rr', 't']




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值