Java中的String类源码阅读（上）_string checkdone(long[] ids);.-优快云博客

本文链接：https://blog.youkuaiyun.com/dzydzy7/article/details/106085754

概述

jdk8中的String类。主要源自源码和官方文档。关键的地方有高亮或中文解释或举例。

/*
 * Copyright (c) 1994, 2013, Oracle and/or its affiliates. All rights reserved.
 * ORACLE PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
 */

package java.lang;

import java.io.ObjectStreamField;
import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.Formatter;
import java.util.Locale;
import java.util.Objects;
import java.util.StringJoiner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

The String class represents character strings. All string literals（文字的） in Java programs, such as "abc", are implemented as instances of this class.

Strings are constant; their values cannot be changed after they are created. String buffers support mutable（可变的） strings. Because String objects are immutable（不可变的） they can be shared. For example:

   String str = "abc";

is equivalent to:

   char data[] = {'a', 'b', 'c'};
   String str = new String(data);

Here are some more examples of how strings can be used:

System.out.println("abc");   
String cde = "cde";   
System.out.println("abc" + cde);   
String c = "abc".substring(2,3);   
String d = cde.substring(1, 2);

The class String includes methods for examining individual characters of the sequence, for comparing strings, for searching strings, for extracting substrings, and for creating a copy of a string with all characters translated to uppercase or to lowercase. Case mapping is based on the Unicode Standard version specified by the Character class.

The Java language provides special support for the string concatenation（一系列相互关联的事） operator ( + ), and for conversion of other objects to strings. String concatenation is implemented through the StringBuilder(or StringBuffer) class and its append method. String conversions are implemented through the method toString, defined by Object and inherited by all classes in Java. For additional information on string concatenation and conversion, see Gosling, Joy, and Steele, The Java Language Specification.

Unless otherwise noted, passing a null argument to a constructor or method in this class will cause a NullPointerException to be thrown.

A String represents a string in the UTF-16 format in which supplementary（补充的） characters are represented（代表） by surrogate（代理） pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.

The String class provides methods for dealing with Unicode code points (i.e., characters), in addition to those for dealing with Unicode code units (i.e., char values).

Since:

JDK1.0
See Also:

Object.toString(), StringBuffer, StringBuilder, Charset, Serialized Form

域

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    /** use serialVersionUID from JDK 1.0.2 for interoperability（互用性） */
    private static final long serialVersionUID = -6849794470754667710L;
    
     /**
     * Class String is special cased within the Serialization Stream Protocol.
     *
     * A String instance is written into an ObjectOutputStream according to
     * <a href="{@docRoot}/../platform/serialization/spec/output.html">
     * Object Serialization Specification, Section 6.2, "Stream Elements"</a>
     */
    private static final ObjectStreamField[] serialPersistentFields =
        new ObjectStreamField[0];
    
    // A Comparator that orders String objects as by compareToIgnoreCase.
    public static final Comparator<String> CASE_INSENSITIVE_ORDER
                                         = new CaseInsensitiveComparator();

可以看到是用数组存储数据的

构造方法

Initializes a newly created {@code String} object so that it represents an empty character sequence. Note that use of this constructor is unnecessary since Strings are immutable.

无参构造，构造空字符串，但是这个构造方法没有必要，因为String内容无法更改

    public String() {
        this.value = "".value;	// value是String类的一个域
    }

Initializes a newly created {@code String} object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit（明确的） copy of {@code original} is needed, use of this constructor is unnecessary since Strings are immutable.

复制原来的串的char数组和哈希值到新的对象。

 	public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

Allocates（分配） a new {@code String} so that it represents the sequence of characters currently contained in the character array argument. The contents of the character array are copied; subsequent modification of the character array does not affect the newly created string.
把字符数组复制给String对象的value属性，因为是复制的，所以原字符串改变不影响String对象。

	public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

Allocates a new {@code String} that contains characters from a subarray of the character array argument. The {@code offset} argument is the index of the first character of the subarray and the {@code count} argument specifies the length of the subarray. The contents of the subarray are copied; subsequent modification of the character array does not affect the newly created string.

复制字符数组数组的子数组，也是复制的，offeset是第一个字符的下标，count是要复制的长度。

	public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1(int的最大值).
        // 本义是offset + count > value.length
        // 但是两数相加可能导致数字溢出
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

Allocates a new {@code String} that contains characters from a subarray of the Unicode code point array argument. The {@code offset} argument is the index of the first code point of the subarray and the {@code count} argument specifies the length of the subarray. The contents of the subarray are converted to {@code char}s; subsequent modification of the {@code int} array does not affect the newly created string.

新字符串包含Unicode代码点数组参数的子数组中的字符。offset参数是子数组的第一个代码点的索引，而count参数指定子数组的长度。将子数组的内容转换为字符。int数组的后续修改不影响新创建的字符串

	public String(int[] codePoints, int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= codePoints.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > codePoints.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }

        final int end = offset + count;

        // Pass 1: Compute precise size of char[]
        int n = count;
        for (int i = offset; i < end; i++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                continue;
            else if (Character.isValidCodePoint(c))
                n++;
            else throw new IllegalArgumentException(Integer.toString(c));
        }

        // Pass 2: Allocate and fill in char[]
        final char[] v = new char[n];

        for (int i = offset, j = 0; i < end; i++, j++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                v[j] = (char)c;
            else
                Character.toSurrogates(c, v, j++);
        }

        this.value = v;
    }

检查数组是否越界的辅助函数CheckBoundes():

private static void checkBounds(byte[] bytes, int offset, int length) {
    if (length < 0)
        throw new StringIndexOutOfBoundsException(length);
    if (offset < 0)
        throw new StringIndexOutOfBoundsException(offset);
    if (offset > bytes.length - length)
        throw new StringIndexOutOfBoundsException(offset + length);
}

不推荐的（Deprecated）不表（参数有ascii的）

Constructs a new String by decoding the specified subarray of bytes using the specified charset. The length of the new String is a function of the charset, and hence may not be equal to the length of the subarray.

通过使用指定的字符集解码指定的字节子数组构造新的字符串。新字符串的长度是字符集的函数，因此可能不等于子数组的长度。

public String(byte bytes[], int offset, int length, Charset charset) {
    if (charset == null)
        throw new NullPointerException("charset");
    checkBounds(bytes, offset, length);
    this.value =  StringCoding.decode(charset, bytes, offset, length);
}

同上，全部转换

public String(byte bytes[], String charsetName)
    throws UnsupportedEncodingException {
    this(bytes, 0, bytes.length, charsetName);
}

Constructs a new String by decoding the specified subarray of bytes using the platform’s default charset. The length of the new String is a function of the charset, and hence may not be equal to the length of the subarray.

使用平台默认的字符集转换子数组

public String(byte bytes[], int offset, int length) {
    checkBounds(bytes, offset, length);
    this.value = StringCoding.decode(bytes, offset, length);
}

使用平台默认的字符集转换整个数组

public String(byte bytes[]) {
    this(bytes, 0, bytes.length);
}

Allocates a new string that contains the sequence of characters currently contained in the string buffer argument. The contents of the string buffer are copied; subsequent modification of the string buffer does not affect the newly created string.

将StringBuffer的实例的内容复制到String对象

public String(StringBuffer buffer) {
    synchronized(buffer) {	// 同步
        this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
    }
}

将StringBuilder的实例的内容复制到String对象，这个方法不如直接toString快。

public String(StringBuilder builder) {
    this.value = Arrays.copyOf(builder.getValue(), builder.length());
}	// 不同步，因为StingBuilder本就线程不安全。

Package private constructor which shares value array for speed. this constructor is always expected to be called with share==true. a separate constructor is needed because we already have a public String(char[]) constructor that makes a copy of the given char[].

共享字符常量的包内构造方法：

String(char[] value, boolean share) {
    // assert share : "unshared not supported";
    this.value = value;
}

计算长度和按下标访问

Returns the length of this string. The length is equal to the number of Unicode code units in the string.

返回有几个Unicode字符。

public int length() {
    return value.length;
}

判空

public boolean isEmpty() {
    return value.length == 0;
}

Returns the char value at the specified index. An index ranges from 0 to length() - 1. The first char value of the sequence is at index 0, the next at index 1, and so on, as for array indexing.

If the char value specified by the index is a surrogate, the surrogate value is returned.

按下标取字符，如果是占4个字节的，就返回 ?。

public char charAt(int index) {
    if ((index < 0) || (index >= value.length)) {
        throw new StringIndexOutOfBoundsException(index);
    }
    return value[index];
}

按下标取Unicode字符，2字节4字节都能取到

public int codePointAt(int index) {
    if ((index < 0) || (index >= value.length)) {
        throw new StringIndexOutOfBoundsException(index);
    }
    return Character.codePointAtImpl(value, index, value.length);
}

指定位置的前一个Unicode字符：

public int codePointBefore(int index) {
    int i = index - 1;
    if ((i < 0) || (i >= value.length)) {
        throw new StringIndexOutOfBoundsException(index);
    }
    return Character.codePointBeforeImpl(value, index, 0);
}

计算Unicode字符的数量

public int codePointCount(int beginIndex, int endIndex) {
    if (beginIndex < 0 || endIndex > value.length || beginIndex > endIndex) {
        throw new IndexOutOfBoundsException();
    }
    return Character.codePointCountImpl(value, beginIndex, endIndex - beginIndex);
}

给定的 index 处偏移 codePointOffset 个代码点（Unicode字符）的索引

public int offsetByCodePoints(int index, int codePointOffset) {
    if (index < 0 || index > value.length) {
        throw new IndexOutOfBoundsException();
    }
    return Character.offsetByCodePointsImpl(value, 0, value.length,
                                            index, codePointOffset);
}

我们对之前的几个方法做测试，表情是4字节的汉字是两字节的

String ss = "飞机😋🍜🐖";
System.out.println(ss.length());
System.out.println(ss.charAt(0) + ", " + ss.charAt(1) + ", " + ss.charAt(2) + ", " + ss.charAt(3) + ", " + ss.charAt(4));
System.out.println(ss.codePointCount(0, ss.length()));
System.out.println(ss.codePointAt(0) + ", " + ss.codePointAt(1) + ", " + ss.codePointAt(2) + ", " + ss.codePointAt(4) + ", " + ss.codePointAt(6));
System.out.println(ss.codePointBefore(4));
System.out.println(ss.offsetByCodePoints(0, 4));

输出为：

8
飞, 机, ?, ?, ?
5
39134, 26426, 128523, 127836, 128022
128523
6

复制到数组

Copies characters from this string into the destination character array.

将这个String对象里的元素拷贝到字符数组中去。

参数：srcBegin—String字符串的起始下标，srcEnd—String字符串的结束下标，dst[]—目标数组，dstBegin—目标数组的起始下标

void getChars(char dst[], int dstBegin) {
    System.arraycopy(value, 0, dst, dstBegin, value.length);
}

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
    if (srcBegin < 0) {
        throw new StringIndexOutOfBoundsException(srcBegin);
    }
    if (srcEnd > value.length) {
        throw new StringIndexOutOfBoundsException(srcEnd);
    }
    if (srcBegin > srcEnd) {
        throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
    }
    System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
}

Encodes this String into a sequence of bytes using the named charset, storing the result into a new byte array.

使用指定的字符集将这个字符串编码为一个字节序列，并将结果存储到一个新的字节数组中。

public byte[] getBytes(String charsetName)
    throws UnsupportedEncodingException {
    if (charsetName == null) throw new NullPointerException();
    return StringCoding.encode(charsetName, value, 0, value.length);
}

public byte[] getBytes(Charset charset) {
    if (charset == null) throw new NullPointerException();
    return StringCoding.encode(charset, value, 0, value.length);
}

使用默认字符集：

public byte[] getBytes() {
    return StringCoding.encode(value, 0, value.length);
}

举例：

ss = "0123456789";
char ch[] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'};
ss.getChars(3,7, ch, 1);
for (char c : ch) System.out.print(c + ", ");