深度剖析JDK 8中String：源码级解读与实战探秘

原创于 2025-06-26 00:03:19 发布 · 728 阅读

17 ·

CC 4.0 BY-SA版权

文章标签：

#java #源码 #学习

源码学习专栏收录该内容

118 篇文章

订阅专栏

深度剖析JDK 8中String：源码级解读与实战探秘

一、String类概述

在Java编程中，String类是使用最为频繁的类之一。在JDK 8中，String类位于java.lang包下，无需额外导入即可使用。它实现了java.io.Serializable、Comparable<String>和CharSequence接口，这表明String类支持序列化、可以进行比较以及具备字符序列的基本操作。

（一）String类的特性

不可变性：String类被声明为final类，这意味着它不能被继承。同时，其内部用于存储字符内容的char数组value也被private和final修饰，这使得String对象一旦创建，其值就不能被更改。例如：

String str = "hello"; 
str = "world";

在上述代码中，变量str的值由“hello”变成“world”，但实际上是str指向了一个新的String对象，原来的“hello”对象依然存在于内存中。

共享性：由于String的不可变性，相同的字符串可以被多个String对象共享。例如：

String s1 = "abc"; 
String s2 = "abc";

这里的s1和s2指向的是字符串常量池中的同一个“abc”对象。

序列化支持：String类实现了Serializable接口，这使得String对象可以在网络传输或持久化存储时进行序列化和反序列化操作。
可比较性：实现了Comparable<String>接口，因此可以使用compareTo方法来比较两个String对象的大小。

（二）String类的内存结构

1. 字符串常量池（String Table）

在JDK 8中，字符串常量池位于堆内存中，被所有线程共享，由垃圾回收器（GC）进行管理。以字面量方式定义的字符串，只要字符序列相同（顺序和大小写），无论在程序代码中出现几次，JVM都只会建立一个String对象，并在字符串池中维护。例如：

String s3 = "abc"; 
String s4 = "abc";

这里的s3和s4指向的是字符串常量池中的同一个“abc”对象。而通过new关键字创建的String对象，每次都会在堆内存中申请新的内存空间，即使内容相同。例如：

char[] chs = {'a', 'b', 'c'}; 
String s1 = new String(chs); 
String s2 = new String(chs);

这里的s1和s2虽然内容相同，但它们在堆内存中是两个不同的对象。

2. 面试题分析：`String s = new String(“abc”);` 创建对象，在内存中创建了几个对象？

答案是两个。一个是堆空间中new出来的String对象，另一个是char[]对应的常量池中的数据：“abc”。

3. `intern()`方法

intern()方法用于手动将String对象添加到字符串常量池中。如果常量池中已经存在与该String对象相等的字符串（通过equals方法判断），则返回常量池中的字符串；否则，将该String对象添加到常量池中，并返回对该对象的引用。例如：

String s1 = new String("abc"); 
String s4 = s1.intern();

这里的s4会指向字符串常量池中的“abc”对象。

二、JDK 8 String源码分析

（一）类定义和成员变量

public final class String 
    implements java.io.Serializable, Comparable<String>, CharSequence { 
    /** The value is used for character storage. */ 
    private final char value[]; 
    /** Cache the hash code for the string */ 
    private int hash; // Default to 0 
}

value数组：用于存储String对象的字符内容。由于被final修饰，该数组的引用不可变，且private修饰限制了外部对该数组的访问，同时没有提供对应的getter和setter方法，因此String对象的内容不可被修改。
hash变量：用于缓存字符串的哈希码，默认值为0。当调用hashCode方法时，如果hash值为0，则会重新计算哈希码并赋值给hash。

（二）常用构造方法

1. 无参构造方法

public String() { 
    this.value = "".value; 
}

无参构造方法会创建一个空的String对象，其value数组指向空字符串的value数组。

2. 有参构造方法：参数为`String`

public String(String original) { 
    this.value = original.value; 
    this.hash = original.hash; 
}

该构造方法会将传入的String对象的value数组和hash值赋值给新创建的String对象。

3. 有参构造方法：参数为`char`数组

public String(char value[]) { 
    this.value = Arrays.copyOf(value, value.length); 
}

该构造方法会将传入的char数组复制到新的char数组中，并赋值给value。这样做是为了避免外部对传入的char数组进行修改而影响String对象的内容。

4. 有参构造方法：参数为`char`数组、起始位置和长度

public String(char value[], int offset, int count) { 
    if (offset < 0) { 
        throw new StringIndexOutOfBoundsException(offset); 
    } 
    if (count <= 0) { 
        if (count < 0) { 
            throw new StringIndexOutOfBoundsException(count); 
        } 
        if (offset <= value.length) { 
            this.value = "".value; 
            return; 
        } 
    } 
    // Note: offset or count might be near -1>>>1. 
    if (offset > value.length - count) { 
        throw new StringIndexOutOfBoundsException(offset + count); 
    } 
    this.value = Arrays.copyOfRange(value, offset, offset + count); 
}

该构造方法会从传入的char数组的指定起始位置offset开始，复制count个字符到新的char数组中，并赋值给value。同时会对传入的参数进行合法性检查，如果参数不合法则会抛出StringIndexOutOfBoundsException异常。

（三）常用方法

1. `equals`方法

public boolean equals(Object anObject) { 
    if (this == anObject) { 
        return true; 
    } 
    if (anObject instanceof String) { 
        String anotherString = (String)anObject; 
        int n = value.length; 
        if (n == anotherString.value.length) { 
            char v1[] = value; 
            char v2[] = anotherString.value; 
            int i = 0; 
            while (n-- != 0) { 
                if (v1[i] != v2[i]) 
                    return false; 
                i++; 
            } 
            return true; 
        } 
    } 
    return false; 
}

equals方法用于比较两个String对象的内容是否相等。首先会比较两个对象的引用是否相同，如果相同则直接返回true；然后会检查传入的对象是否为String类型，如果是，则比较两个String对象的长度和每个字符是否相同。

2. `hashCode`方法

public int hashCode() { 
    int h = hash; 
    if (h == 0 && value.length > 0) { 
        char val[] = value; 
        for (int i = 0; i < value.length; i++) { 
            h = 31 * h + val[i]; 
        } 
        hash = h; 
    } 
    return h; 
}

hashCode方法用于计算String对象的哈希码。如果hash值为0且value数组的长度大于0，则会重新计算哈希码并赋值给hash。计算哈希码的公式为：h = 31 * h + val[i]，其中31是一个质数，这样可以减少哈希冲突的概率。

3. `substring`方法

public String substring(int beginIndex) { 
    if (beginIndex < 0) { 
        throw new StringIndexOutOfBoundsException(beginIndex); 
    } 
    int subLen = value.length - beginIndex; 
    if (subLen < 0) { 
        throw new StringIndexOutOfBoundsException(subLen); 
    } 
    return (beginIndex == 0) ? this : new String(value, beginIndex, subLen); 
} 

public String substring(int beginIndex, int endIndex) { 
    if (beginIndex < 0) { 
        throw new StringIndexOutOfBoundsException(beginIndex); 
    } 
    if (endIndex > value.length) { 
        throw new StringIndexOutOfBoundsException(endIndex); 
    } 
    int subLen = endIndex - beginIndex; 
    if (subLen < 0) { 
        throw new StringIndexOutOfBoundsException(subLen); 
    } 
    return ((beginIndex == 0) && (endIndex == value.length)) ? this 
            : new String(value, beginIndex, subLen); 
}

substring方法用于截取String对象的子字符串。会对传入的起始位置和结束位置进行合法性检查，如果参数不合法则会抛出StringIndexOutOfBoundsException异常。如果截取的子字符串就是原字符串本身，则直接返回原字符串；否则会创建一个新的String对象来表示截取的子字符串。

4. `concat`方法

public String concat(String str) { 
    int otherLen = str.length(); 
    if (otherLen == 0) { 
        return this; 
    } 
    int len = value.length; 
    char buf[] = Arrays.copyOf(value, len + otherLen); 
    str.getChars(buf, len); 
    return new String(buf, true); 
}

concat方法用于将指定的字符串连接到原字符串的末尾。如果指定的字符串长度为0，则直接返回原字符串；否则会创建一个新的char数组，将原字符串和指定字符串的内容复制到新数组中，并创建一个新的String对象来表示连接后的字符串。

5. `replace`方法

public String replace(char oldChar, char newChar) { 
    if (oldChar != newChar) { 
        int len = value.length; 
        int i = -1; 
        char[] val = value; /* avoid getfield opcode */ 
        while (++i < len) { 
            if (val[i] == oldChar) { 
                break; 
            } 
        } 
        if (i < len) { 
            char buf[] = new char[len]; 
            for (int j = 0; j < i; j++) { 
                buf[j] = val[j]; 
            } 
            while (i < len) { 
                char c = val[i]; 
                buf[i] = (c == oldChar) ? newChar : c; 
                i++; 
            } 
            return new String(buf, true); 
        } 
    } 
    return this; 
}

replace方法用于将原字符串中的所有指定字符oldChar替换为新字符newChar。如果oldChar和newChar相同，则直接返回原字符串；否则会遍历原字符串，找到第一个oldChar的位置，然后创建一个新的char数组，将原字符串中oldChar之前的字符复制到新数组中，之后将oldChar替换为newChar并复制到新数组中，最后创建一个新的String对象来表示替换后的字符串。

三、String对象的创建方式

（一）字面量方式

String s1 = "abc";

这种方式会先在字符串常量池中查找是否存在“abc”对象，如果存在则直接返回该对象的引用；如果不存在，则在字符串常量池中创建“abc”对象并返回其引用。

（二）`new`关键字方式

String s2 = new String("abc");

这种方式会在堆内存中创建一个新的String对象，同时会在字符串常量池中检查是否存在“abc”对象，如果不存在则会在字符串常量池中创建“abc”对象。最终s2指向的是堆内存中的新对象。

（三）`intern`方法手动入池

String s3 = new String("abc").intern();

这里先在堆内存中创建一个“abc”对象，然后调用intern方法将该对象添加到字符串常量池中。如果常量池中已经存在“abc”对象，则s3指向常量池中的对象；否则，将堆内存中的对象添加到常量池中并让s3指向该对象。

四、String对象的拼接

（一）`+`运算符拼接

String s1 = "a"; 
String s2 = "b"; 
String s3 = s1 + s2;

在编译期，+运算符拼接实际上会被转换为StringBuilder或StringBuffer的append方法。例如上面的代码在编译后相当于：

String s1 = "a"; 
String s2 = "b"; 
StringBuilder sb = new StringBuilder(); 
 sb.append(s1); 
 sb.append(s2); 
String s3 = sb.toString();

如果拼接的是常量字符串，则会在编译期进行优化，直接将常量拼接成一个新的字符串。例如：

String s4 = "a" + "b";

这里的s4实际上等价于String s4 = "ab";，会直接从字符串常量池中获取“ab”对象。

（二）`concat`方法拼接

String s1 = "a"; 
String s2 = "b"; 
String s3 = s1.concat(s2);

concat方法会创建一个新的String对象来表示拼接后的字符串。与+运算符拼接不同，concat方法不会使用StringBuilder或StringBuffer。

（三）`String.join`方法拼接

在JDK 8中，引入了String.join方法来方便地拼接字符串。例如：

String[] words = {"Hello", "World"}; 
String result = String.join(" ", words);

这里的String.join方法会使用指定的分隔符（这里是空格）将数组中的字符串拼接成一个新的字符串。

五、String类与其他相关类的比较

（一）`String`与`StringBuffer`、`StringBuilder`的区别

1. 可变性

String是不可变字符串，一旦创建其值不能被更改；而StringBuffer和StringBuilder是可变字符串，可以动态修改字符串的内容。

2. 线程安全性

StringBuffer是线程安全的，其方法都使用了synchronized关键字进行同步；StringBuilder是非线程安全的，但性能比StringBuffer更高。因此，在单线程环境下，优先使用StringBuilder；在多线程环境下，需要使用StringBuffer来保证线程安全。

3. 使用场景

如果字符串值不会改变，推荐使用String；如果需要在单线程环境下频繁修改字符串，使用StringBuilder是最佳选择；如果需要在多线程环境下安全地修改字符串，使用StringBuffer。

（二）JDK版本对`String`类的影响

JDK 6及之前：String对象主要有四个成员变量：char数组、偏移量offset、字符数量count、哈希值hash。String对象通过offset和count两个属性来定位char[]数组，获取字符串。这种方式可能会导致内存泄漏。
JDK 7 - JDK 8：String类中不再有offset和count两个变量，String对象占用的内存稍微少了些，同时substring方法也不再共享char[]，从而解决了使用该方法可能导致的内存泄漏问题。
JDK 9及之后：将char[]字段改为了byte[]字段，并维护了一个新的属性coder，用于标识字符串的编码格式。这是为了节约内存空间，因为大多数情况下字符串只包含单字节编码的字符，使用byte数组可以减少内存占用。