1、String的不可变性
- String:字符串,使用一对" "引起来表示。String s1="丁总",String s2=new String("丁总")
- String声明为final的,不可被继承
- String实现Serializable接口:表示字符串是支持序列化的。实现了Comparable接口:表示String可以比较大小
- String在jdk8以前内部定义了final char[ ] value 用于存储字符串数据。jdk9时改为byte[ ]
JEP 254: Compact Strings
Author Brent Christian Owner Xueming Shen Type Feature Scope Implementation Status Closed / Delivered Release 9 Component core-libs / java.lang Discussion core dash libs dash dev at openjdk dot java dot net Effort L Duration XL Relates to JEP 192: String Deduplication in G1 8144691: JEP 254: Compact Strings: endiannes mismatch in Java source code and intrinsic JEP 250: Store Interned Strings in CDS Archives JEP 280: Indify String Concatenation Reviewed by Aleksey Shipilev, Brian Goetz, Charlie Hunt Endorsed by Brian Goetz Created 2014/08/04 21:54 Updated 2022/04/11 23:06 Issue 8054307 Summary
Adopt a more space-efficient internal representation for strings.
Goals
Improve the space efficiency of the
Stringclass and related classes while maintaining performance in most scenarios and preserving full compatibility for all related Java and native interfaces.Non-Goals
It is not a goal to use alternate encodings such as UTF-8 in the internal representation of strings. A subsequent JEP may explore that approach.
Motivation
The current implementation of the
Stringclass stores characters in achararray, using two bytes (sixteen bits) for each character. Data gathered from many different applications indicates that strings are a major component of heap usage and, moreover, that mostStringobjects contain only Latin-1 characters. Such characters require only one byte of storage, hence half of the space in the internalchararrays of suchStringobjects is going unused.Description
We propose to change the internal representation of the
Stringclass from a UTF-16chararray to abytearray plus an encoding-flag field. The newStringclass will store characters encoded either as ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes per character), based upon the contents of the string. The encoding flag will indicate which encoding is used.String-related classes such as
AbstractStringBuilder,StringBuilder, andStringBufferwill be updated to use the same representation, as will the HotSpot VM's intrinsic string operations.This is purely an implementation change, with no changes to existing public interfaces. There are no plans to add any new public APIs or other interfaces.
The prototyping work done to date confirms the expected reduction in memory footprint, substantial reductions of GC activity, and minor performance regressions in some corner cases.
For further detail, see:
Alternatives
We tried a "compressed strings" feature in JDK 6 update releases, enabled by an
-XXflag. When enabled,String.valuewas changed to anObjectreference and would point either to abytearray, for strings containing only 7-bit US-ASCII characters, or else achararray. This implementation was not open-sourced, so it was difficult to maintain and keep in sync with the mainline JDK source. It has since been removed.Testing
Thorough compatibility and regression testing will be essential for a change to such a fundamental part of the platform.
We will also need to confirm that we have fulfilled the performance goals of this project. Analysis of memory savings will need to be done. Performance testing should be done using a broad range of workloads, ranging from focused microbenchmarks to large-scale server workloads.
We will encourage the entire Java community to perform early testing with this change in order to identify any remaining issues.
Risks and Assumptions
Optimizing character storage for memory may well come with a trade-off in terms of run-time performance. We expect that this will be offset by reduced GC activity and that we will be able to maintain the throughput of typical server benchmarks. If not, we will investigate optimizations that can strike an acceptable balance between memory saving and run-time performance.
Other recent projects have already reduced the heap space used by strings, in particular JEP 192: String Deduplication in G1. Even with duplicates eliminated, the remaining string data can be made to consume less space if encoded more efficiently. We are assuming that this project will still provide a benefit commensurate with the effort required.
Source code
Tools
Groups
Compatibility & Specification Review
Projects
© 2023 Oracle Corporation and/or its affiliates
Terms of Use · License: GPLv2 · Privacy · Trademarks2、Motivation(动机)
- String类的当前实现将字符存储在 字符数组 中,每个字符使用两个字节(16位)。从许多不同的应用程序收集的数据表明,字符串是堆使用的主要组成部分,而且,大多数String对象只包含Latin-1字符。这些字符只需要一个字节的存储空间,因此这些String对象的内部字符数组中有一半的空间是未使用的。
3、Description(描述)
- 我们建议将String类的内部表示形式从UTF-16字符数组更改为字符数组加上编码标志字段。新的String类将根据字符串的内容存储编码为ISO-8859-1/Latin-1(每个字符一个字节)或UTF-16(每个字符两个字节)的字符。编码标志将指示使用哪种编码。
- 与字符串相关的类,如AbstractStringBuilder、StringBuilder和StringBuffer,将被更新为使用相同的表示,HotSop VM的固定字符串操作也是如此。
- 这是一个纯粹的实现更改,没有更改现有的公共接口。没有计划添加任何新的公共api或其他接口。
- 到目前为止所做的原型工作证实了预期的内存占用减少、GC活动的大量减少以及在某些极端情况下的轻微性能下降。
- 有关详情,请参阅:弘密度性能状态,弘密度对SPECjbb2005对SPARC的影响
- 结论:String再也不用char[ ]来存储啦,改成了byte[ ]加上编码标记,节约了一些空间。
4、String的基本特性
- String:代表不可变的字符序列。简称:不可变性。
- 当对字符串重新赋值时,需要 重写指定 内存区域赋值,不能使用原有的value进行赋值。
- 当对现有的字符串进行连接操作时,也需要 重新定义 内存区域赋值,不能使用原有的value进行赋值。
- 当调用String的replace()方法修改指定字符或字符串时,也需要 重新指定 内存区域赋值,不能使用原来的value进行赋值。
- 通过字面量的方式(区别于new)给一个字符串赋值,此时的字符串值声明在 字符串常量池 中。
package string; import org.junit.Test; public class StringTest1 { @Test public void test1() { String s1 = "abc"; String s2 = "abc"; System.out.println(s1 == s2);//true System.out.println(s1);//abc System.out.println(s2);//abc } @Test public void test2() { String s1 = "abc"; String s2 = "abc"; s1 = "hello"; System.out.println(s1 == s2);//false System.out.println(s1);//hello System.out.println(s2);//abc } @Test public void test3() { String s1 = "abc"; String s2 = "abc"; s2 += "def"; System.out.println(s2);//abcdef System.out.println(s1);//abc } @Test public void test4() { String s1 = "abc"; String s2 = s1.replace('a', 'm'); System.out.println(s1);//abc System.out.println(s2);//mbc } }5、面试题
package string; public class StringExer { String str = new String("good"); char[] ch = {'t', 'e', 's', 't'}; public void change(String str, char ch[]) { str = "test ok"; ch[0] = 'b'; } public static void main(String[] args) { StringExer ex = new StringExer(); ex.change(ex.str, ex.ch); System.out.println(ex.str);//good System.out.println(ex.ch);//best } } D:\Java\jdk-17\bin\java.exe "-javaagent:D:\BaiduNetdiskDownload\IntelliJ IDEA 2023.2\lib\idea_rt.jar=31487:D:\BaiduNetdiskDownload\IntelliJ IDEA 2023.2\bin" -Dfile.encoding=UTF-8 -classpath F:\IdeaProjects\JavaSenior\out\production\jdk8;D:\develop\maven\repository\junit\junit\4.13.1\junit-4.13.1.jar;D:\develop\maven\repository\org\hamcrest\hamcrest-core\1.3\hamcrest-core-1.3.jar string.StringExer good best Process finished with exit code 0
本文讨论了JavaString类的内存优化策略,提出从UTF-16字符数组转换为使用byte[]和编码标志,以节省空间并改善性能,同时保持向后兼容。文章还分析了动机、测试计划和潜在风险。
155

被折叠的 条评论
为什么被折叠?



