java.lang.String源码阅读笔记

最新推荐文章于 2023-08-30 00:24:47 发布

原创最新推荐文章于 2023-08-30 00:24:47 发布 · 370 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#java.lang.String #JDK

JDK 专栏收录该内容

10 篇文章

订阅专栏

本文深入解析 Java 中 String 类的设计理念、内部实现细节及其常用方法。包括字符串的不可变性、构造函数、比较方法、字符串操作等核心内容。

一、String类简介

The {@code String} class represents character strings. All
string literals in Java programs, such as {@code “abc”}, are
implemented as instances of this class.

String表示字符串，Java中所有字符串的字面量，如"abc"，都是String类的实例。

Strings are constant; their values cannot be changed after they
are created. String buffers support mutable strings.
Because String objects are immutable they can be shared. For example:

字符串是常量；他们的值在创建后不可更改。字符串缓冲区支持可变的字符串。因为String对象是不可变的，所以可以被共享。比如：

 

      String str = "abc";

is equivalent to:

 

      char data[] = {'a', 'b', 'c'}; 

      String str = new String(data);

Here are some more examples of how strings can be used:

 

      System.out.println("abc"); 

      String cde = "cde"; 

      System.out.println("abc" + cde); 

      String c = "abc".substring(2,3); 

      String d = cde.substring(1, 2);

The class {@code String} includes methods for examining
individual characters of the sequence, for comparing strings, for
searching strings, for extracting substrings, and for creating a
copy of a string with all characters translated to uppercase or to
lowercase. Case mapping is based on the Unicode Standard version
specified by the {@link java.lang.Character Character} class.

String类提供了序列中单个字符的
检查方法，如比较字符串，搜索字符串，提取字符子串，字符串的大小写转换等。实例映射是基于Character类中指定的Unicode标准版本的。

The Java language provides special support for the string
concatenation operator ( + ), and for conversion of
other objects to strings. String concatenation is implemented
through the {@code StringBuilder}(or {@code StringBuffer})
class and its {@code append} method.
String conversions are implemented through the method
{@code toString}, defined by {@code Object} and
inherited by all classes in Java. For additional information on
string concatenation and conversion, see Gosling, Joy, and Steele,
The Java Language Specification.

Java语言提供对字符串连接符+的特殊支持，并且也支持将其他对象转换成字符串。字符串连接是通过StringBuilder或StringBuffer类和其append()方法实现的。字符串转换是通过toString()方法实现的，这个方法由Object类定义并且可以被Java中所有的类继承。

Unless otherwise noted, passing a null argument to a constructor
or method in this class will cause a {@link NullPointerException} to be
thrown.

除非另有说明，向该类的构造器或方法中传递null参数将会导致抛出NullPointerException异常

A {@code String} represents a string in the UTF-16 format
in which supplementary characters are represented by surrogate
pairs (see the section Unicode
Character Representations in the {@code Character} class for
more information).
Index values refer to {@code char} code units, so a supplementary
character uses two positions in a {@code String}.

String表示一个UTF-16格式的字符串，补充字符通过代理对表示。索引值参考char编码单元，所以一个补充字符占用String的两个位置。

The {@code String} class provides methods for dealing with
Unicode code points (i.e., characters), in addition to those for
dealing with Unicode code units (i.e., {@code char} values).

String类提供了处理Unicode编码点（如字符）的方法，还有处理Unicode编码单元（如char值）的方法。

二、源码分析

1. 定义

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence

可以看出String是final类型的类，说明不可被继承，同时，该类实现了三个接口：Serializable，Comparable，CharSequence

2. 成员变量

    /** 用于存储字符. */
    private final char value[];

这是一个final类型的字符数组，用于存储字符串的具体内容，从final可以看出，String的内容一旦被初始化后，是不能改变的。虽然经常见到类似于： String s = “a”; s = “b”;
但s只是对对象的引用（即指向对象的地址），因此这并不是对原s的修改，只是s重新指向了另一个新的对象的地址。

    /** Cache the hash code for the string */
    private int hash; // Default to 0

为字符串缓存哈希码（默认为0）

    private static final long serialVersionUID = -6849794470754667710L;

    private static final ObjectStreamField[] serialPersistentFields =
        new ObjectStreamField[0];

因为String实现了Serializable接口，所以支持序列化和反序列化。

3. 构造方法

1. 使用字符数组、字符串构造一个String

    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

直接将原字符串的value、hash值赋值给String

    public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

Arrays.copyOf()方法会将原有的字符数组中内容一一复制到String中的字符数组value中

2. 使用字节数组构造一个String

因为byte是网络传输或存储的序列化形式。所以在很多传输和存储的过程中需要将byte[]数组和String进行相互转化，因此String提供了很多重载的构造方法将字节数组转换为String，而提到byte[]和String间的转换就不得不考虑到编码问题

    public String(byte bytes[], Charset charset) {
        this(bytes, 0, bytes.length, charset);
    }

这个方法就是通过charset来指定byte数组，将其解码成Unicode的char[]数组，构造成新的String

3. 使用StringBuffer/StringBuilder构造一个String

    public String(StringBuffer buffer) {
        synchronized(buffer) {
            this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
        }

    public String(StringBuilder builder) {
        this.value = Arrays.copyOf(builder.getValue(), builder.length());
    }

但是一般不这么用，一般是有了StringBuffer/StringBuilder对象后调用其toString()方法。
还需要注意一点：因为StringBuffer中的toString()方法被synchronized修饰，牺牲掉一定的速度保证线程安全，所以使用StringBuilder的toString()方法会更快一些。

4. 其他方法

1. equals()

源码的编写质量是不用再夸了，是非常高效的写法

    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }

先判断要比较的对象和当前对象是不是同一个对象，如果是，直接返回true

        if (anObject instanceof String) {

判断要比较的对象是不是String类型的，如果不是，直接返回false；如果是，才继续比较

            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {

接下来比较字符串的长度是否一致，如果相等，才继续比较，否则返回false

                char v1[] = value;//String的字符数组
                char v2[] = anotherString.value;//要比较的对象的字符数组
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;

最后才是一个个比较字符数组中内容的代码。
这样很大程度的提高了比较的效率

2. equalsIgnoreCase()

忽略大小写的比较

    public boolean equalsIgnoreCase(String anotherString) {
        return (this == anotherString) ? true
                : (anotherString != null)
                && (anotherString.value.length == value.length)
                && regionMatches(true, 0, anotherString, 0, value.length);
    }

其中调用的regionMathches方法的定义是

    public boolean regionMatches(boolean ignoreCase, int toffset,
            String other, int ooffset, int len)

regionMathches的作用就是可以忽略大小写比较某个字符串对象和当前字符串对象的部分字符串（也可以不忽略大小写）

3. hashCode()

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

这个方法就是为String产生一个hash code

String类重写了hashCode()方法，Object中的hashCode()方法是一个Native调用。String类的hash采用多项式计算得来，我们完全可以通过不相同的字符串得出同样的hash，所以两个String对象的hashCode相同，并不代表两个String是一样的。

或者这么分析：

在Java中，整型数是32位的，也就是说最多有2^32= 4294967296个整数。将任意一个字符串，经过hashCode计算之后，得到的整数应该在这4294967296数之中。那么，最多有 4294967296+1个不同的字符串作hashCode之后，肯定有两个结果是一样的。意思就是，肯定存在着这么两个不相同的字符串，其hash code值相同。从而得到结论：两个String对象的hash code相同，但是这两个String不一定相同。 hashCode具体的计算方法就是：

      s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

s[i]是字符串中第i个字符，n为字符串长度，^表示幂，空字符串的hash code为0 **那么，为什么选择31呢？** 原因有三： 1. 因为31是素数 2. 因为31是够大 3. 因为31不是很大解释： 1. 为什么选用素数？ - 素数是只能被1和本身整除的数 - 因此，如果用一个数*素数，那么结果只能被该数、该素数和1整除 - 所以我们比较容易从结果推得这个数。 2. 为什么要选够大的数？ - 在计算hash地址时，我们希望减少重复的hash地址，即冲突 - 所以计算出的hash地址越大，产生冲突的概率就越小，查找效率就会提高 3. 为什么不能选很大的数？ - 相乘的数过大会导致溢出，从而导致数据丢失

##### 4. getBytes() 作用：将String转换成byte，但是因为没有指定编码方式，所以该方法对字符串进行编码时就会采用系统默认的编码方式。所以，尽量指定编码方式：使用 getBytes("uft-8") ##### 5. substring()

    public String substring(int beginIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        int subLen = value.length - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
    }

使用String(value, beginIndex, subLen)方法创建一个新的String并返回，这个方法会将原来的char[]中的值一一复制到新的String中，虽然损失了一些性能，但是不会有内存泄漏的问题。

6. replaceFirst、replaceAll、replace区别

public String replaceFirst(String regex, String replacement)

replaceFirst是基于规则表达式的替换，并且只替换第一次出现的字符串

public String replaceAll(String regex, String replacement)

replaceAll也是基于规则表达式的替换,比如,可以通过replaceAll(“\d”, “*”)把一个字符串所有的数字字符 全部替换 星号

public String replace(CharSequence target, CharSequence replacement)

支持字符/字符串的替换

7. String对”+”的重载

Java是不支持重载运算符，String的“+”是java中唯一的一个重载运算符

String对“+”的支持其实就是使用了StringBuilder以及他的append、toString两个方法。”

public static void main(String[] args) {
    String string="hollis";
    String string2 = string + "chuang";
}

反编译后

public static void main(String args[]){
   String string = "hollis";
   String string2 = (new StringBuilder(String.valueOf(string))).append("chuang").toString();
}

看下面的代码

    public static void main(String[] args) {
        String str = "abc";
        String new_str1 = str + "def";
        String new_str2 = "abc" + "def";
        System.out.println(new_str1);
        System.out.println(new_str2);
    }

输出结果都是：abcdef

但是他们的

8. String.valueOf和Integer.toString的区别

1.int i = 5;
2.String i1 = "" + i;
3.String i2 = String.valueOf(i);
4.String i3 = Integer.toString(i);

以上三种方式都可将int类型的i变成String类型，区别在哪儿呢？

i2、i3的方法其实没有区别，因为String.valueOf(i)也是调用Integer.toString(i)实现的

i1的实现方法是不一样的，实际上是：

String i1 = (new StringBuilder()).append(i).toString();

即先创建一个StringBuilder对象，然后调用append()方法将i放入，最后调用toString()方法

9. compareTo()

作用：
- 如果指定的数与参数相等返回0。
- 如果指定的数小于参数返回 -1。
- 如果指定的数大于参数返回 1。

    public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);

找出比较的两个字符串的长度中最小的

        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;
            }
            k++;
        }
        return len1 - len2;
    }

如果比较到有不相等的字符，返回指定的字符减去参数表示的字符；否则的话，说明两个字符串相等。

10. trim()

删除字符串的头尾空白符

    public String trim() {
        int len = value.length;
        int st = 0;
        char[] val = value;    /* avoid getfield opcode */

        //从前向后找到第一个非空格字符
        while ((st < len) && (val[st] <= ' ')) {
            st++;
        }
        //从后向前找到第一个非空格字符
        while ((st < len) && (val[len - 1] <= ' ')) {
            len--;
        }
        //如果没有空格，返回本身
        return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
    }

11. intern()

public native String intern();

这是一个native方法，调用这个方法之后把字符串对象加入常量池中，这是对Class文件常量池动态性（运行期间也可将新的常量放入池中）的应用。
不过要注意一下，在JDK1.6及之前，中常量池是在方法区中的，而在JDK1.7及之后，常量池被迁移到了堆中。
看以下代码：

    public static void main(String[] args) {
        String str0 = new String("abc") + new String("def");
        str0.intern();
        String str1 = "abcdef";

        System.out.println("str0 == str1: "+ (str0 == str1));
    }

在JDK1.6，运行结果是：

str0 == str1: false

执行过程是：
1. 先在堆中开辟内存存放”abc”,”def”和”abcdef”
2. 调用intern()时，发现常量池中没有”abcdef”，所以将str0中字符串”abcdef”放入方法区的常量池
3. str1会先在常量池中找是否有”abcdef”，有的话就直接指向常量池中该字符串，而str0指向的是堆中的对象，所以str0和str1指向的地址不同。

在JDK1.7中，运行结果是：

str0 == str1: true

执行过程是：
1. 先在堆中开辟内存存放”abc”,”def”和”abcdef”
2. 调用intern()时，发现常量池中没有”abcdef”，则直接在常量池中指向str0在堆中的引用
3. str1会先在常量池中找是否有”abcdef”，有的话就直接指向常量池中该字符串，即str0的引用，所以str0和str1指向的地址相同。