Fixing the Java Memory Model, Part 1

Java theory and practice

Fixing the Java Memory Model, Part 1

What is the Java Memory Model, and how was it broken in the first place?

Comments
 
0

The Java platform has integrated threading and multiprocessing into the language to a much greater degree than most previous programming languages. The language's support for platform-independent concurrency and multithreading was ambitious and pioneering, and, perhaps not surprisingly, the problem was a bit harder than the Java architects originally thought. Underlying much of the confusion surrounding synchronization and thread safety are the non-intuitive subtleties of the Java Memory Model (JMM), originally specified in Chapter 17 of the Java Language Specification, and re-specified by JSR 133.

For example, not all multiprocessor systems exhibit cache coherency; if one processor has an updated value of a variable in its cache, but one which has not yet been flushed to main memory, other processors may not see that update. In the absence of cache coherency, two different processors may see two different values for the same location in memory. This may sound scary, but it is by design -- it is a means of obtaining higher performance and scalability -- but it places a burden on developers and compilers to create code that accommodates these issues.

What is a memory model, and why do I need one?

A memory model describes the relationship between variables in a program (instance fields, static fields, and array elements) and the low-level details of storing them to and retrieving them from memory in a real computer system. Objects are ultimately stored in memory, but the compiler, runtime, processor, or cache may take some liberties with the timing of moving values to or from a variable's assigned memory location. For example, a compiler may choose to optimize a loop index variable by storing it in a register, or the cache may delay flushing a new value of a variable to main memory until a more opportune time. All of these optimizations are to aid in higher performance, and are generally transparent to the user, but on multiprocessor systems, these complexities may sometimes show through.

The JMM allows the compiler and cache to take significant liberties with the order in which data is moved between a processor-specific cache (or register) and main memory, unless the programmer has explicitly asked for certain visibility guarantees using synchronized or volatile. This means that in the absence of synchronization, memory operations can appear to happen in different orders from the perspective of different threads.

By contrast, languages like C and C++ do not have explicit memory models -- C programs instead inherit the memory model of the processor executing the program (although the compiler for a given architecture probably does know something about the memory model of the underlying processor, and some of the responsibility for compliance falls to the compiler). This means that concurrent C programs may run correctly on one processor architecture, but not another. While the JMM may be confusing at first, there is a significant benefit to it -- a program that is correctly synchronized according to the JMM should run correctly on any Java-enabled platform.

Shortcomings of the original JMM

While the JMM specified in chapter 17 of the Java Language Specification was an ambitious attempt to define a consistent, cross-platform memory model, it had some subtle but significant failings. The semantics of synchronized and volatilewere quite confusing, so much so that many knowledgeable developers chose to sometimes ignore the rules because writing properly synchronized code under the old memory model was so difficult.

The old JMM allowed some surprising and confusing things to happen, such as final fields appearing not to have the value that was set in the constructor (thus making supposedly immutable objects not immutable) and unexpected results with memory operation reordering. It also prevented some otherwise effective compiler optimizations. If you've read any of the articles on the double-checked locking problem (see Related topics), you'll recall how confusing memory operation reordering can be, and how subtle but serious problems can sneak into your code when you don't properly synchronize (or actively try to avoid synchronizing). Worse, many incorrectly synchronized programs appear to work correctly in some situations, such as under light load, on uniprocessor systems, or on processors with stronger memory models than required by the JMM.

The term reordering is used to describe several classes of actual and apparent reorderings of memory operations:

  • The compiler is free to reorder certain instructions as an optimization when it would not change the semantics of the program.
  • The processor is allowed to execute operations out of order under some circumstances.
  • The cache is generally allowed to write variables back to main memory in a different order than they were written by the program.

Any of these conditions could cause operations to appear, from the perspective of another thread, to occur in a different order than specified by the program -- and regardless of the source of the reordering, all are considered equivalent by the memory model.

Goals of JSR 133

JSR 133, chartered to fix the JMM, has several goals:

  • Preserve existing safety guarantees, including type-safety.
  • Provide out-of-thin-air safety. This means that variable values are not created "out of thin air" -- so for a thread to observe a variable to have value X, some thread must have actually written the value X to that variable in the past.
  • The semantics of "correctly synchronized" programs should be as simple and intuitive as feasible. Thus, "correctly synchronized" should be defined both formally and intuitively (and the two definitions should be consistent with each other!).
  • Programmers should be able to create multithreaded programs with confidence that they will be reliable and correct. Of course, there is no magic that makes writing concurrent applications easy, but the goal is to relieve application writers of the burden of understanding all the subtleties of the memory model.
  • High performance JVM implementations across a wide range of popular hardware architectures should be possible. Modern processors differ substantially in their memory models; the JMM should accommodate as many possible architectures as practical, without sacrificing performance.
  • Provide a synchronization idiom that allows us to publish an object and have it be visible without synchronization. This is a new safety guarantee called initialization safety.
  • There should be minimal impact on existing code.

It is worth noting that broken techniques like double-checked locking are still broken under the new memory model, and that "fixing" double-checked locking was not one of the goals of the new memory model effort. (However, the new semantics of volatile allow one of the commonly proposed alternatives to double-checked locking to work correctly, although the technique is still discouraged.)

In the three years since the JSR 133 process has been active, it has been discovered that these issues are far more subtle than anyone gave them credit for. Such is the price of being a pioneer! The final formal semantics are more complicated than originally expected, and in fact took quite a different form than initially envisioned, but the informal semantics are clear and intuitive and will be outlined in Part 2 of this article.

Synchronization and visibility

Most programmers know that the synchronized keyword enforces a mutex (mutual exclusion) that prevents more than one thread at a time from entering a synchronized block protected by a given monitor. But synchronization also has another aspect: It enforces certain memory visibility rules as specified by the JMM. It ensures that caches are flushed when exiting a synchronized block and invalidated when entering one, so that a value written by one thread during a synchronized block protected by a given monitor is visible to any other thread executing a synchronized block protected by that same monitor. It also ensures that the compiler does not move instructions from inside a synchronized block to outside (although it can in some cases move instructions from outside a synchronized block inside). The JMM does not make this guarantee in the absence of synchronization -- which is why synchronization (or its younger sibling, volatile) must be used whenever multiple threads are accessing the same variables.

Problem #1: Immutable objects weren't

One of the most surprising failings of the JMM is that immutable objects, whose immutability was intended to be guaranteed through use of the final keyword, could appear to change their value. (Public Service Reminder: Making all fields of an object final does not necessarily make the object immutable -- all fields must also be primitive types or references to immutable objects.) Immutable objects, like String, are supposed to require no synchronization. However, because of potential delays in propagating changes in memory writes from one thread to another, there is a possible race condition that would allow a thread to first see one value for an immutable object, and then at some later time see a different value.

How can this happen? Consider the implementation of String in the Sun 1.4 JDK, where there are basically three important final fields: a reference to a character array, a length, and an offset into the character array that describes the start of the string being represented. String was implemented this way, instead of having only the character array, so character arrays could be shared among multiple String and StringBuffer objects without having to copy the text into a new array every time a String is created. For example, String.substring() creates a new string that shares the same character array with the original String and merely differs in the length and offset fields.

Suppose you execute the following code:

1
2
String s1 = "/usr/tmp";
String s2 = s1.substring(4);   // contains "/tmp"

The string s2 will have an offset of 4 and a length of 4, but will share the same character array, the one containing "/usr/tmp", with s1. Before the String constructor runs, the constructor for Object will initialize all fields, including the final length and offset fields, with their default values. When the String constructor runs, the length and offset are then set to their desired values. But under the old memory model, in the absence of synchronization, it is possible for another thread to temporarily see the offset field as having the default value of 0, and then later see the correct value of 4. The effect is that the value of s2 changes from "/usr" to "/tmp". This is not what was intended, and it might not be possible on all JVMs or platforms, but it was allowed by the old memory model specification.

Problem #2: Reordering volatile and nonvolatile stores

The other major area where the existing JMM caused some very confusing results was with memory operation reordering on volatile fields. The existing JMM says that volatile reads and writes are to go directly to main memory, prohibiting caching values in registers and bypassing processor-specific caches. This allows multiple threads to always see the most up-to-date value for a given variable. However, it turns out that this definition of volatile was not as useful as first intended, and resulted in significant confusion on the actual meaning of volatile.

In order to provide good performance in the absence of synchronization, the compiler, runtime, and cache are generally allowed to reorder ordinary memory operations as long as the currently executing thread cannot tell the difference. (This is referred to as within-thread as-if-serial semantics.) Volatile reads and writes, however, are totally ordered across threads; the compiler or cache cannot reorder volatile reads and writes with each other. Unfortunately, the JMM did allow volatile reads and writes to be reordered with respect to ordinary variable reads and writes, meaning that we cannot use volatile flags as an indication of what operations have been completed. Consider the following code, where the intention is that the volatile field initialized is supposed to indicate that initialization has been completed:

Listing 1. Using a volatile field as a "guard" variable.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Map configOptions;
char[] configText;
volatile boolean initialized = false;
 
. . .
 
// In thread A
 
configOptions = new HashMap();
configText = readConfigFile(fileName);
processConfigOptions(configText, configOptions);
initialized = true;
 
. . .
 
// In thread B
 
while (!initialized)
   sleep();
// use configOptions

The idea here is that the volatile variable initialized acts as a guard to indicate that a set of other operations have completed. It's a good idea, but under the old JMM it didn't work, because the old JMM allowed nonvolatile writes (such as the write to the configOptions field, as well as the writes to the fields of the Map referenced by configOptions) to be reordered with volatile writes. So another thread might see initialized as true, but not yet have a consistent or current view of the field configOptions or the objects it references. The old semantics of volatile only made promises about the visibility of the variable being read or written, and no promises about other variables. While this approach was easier to implement efficiently, it turned out to be less useful than initially thought.

Summary

As specified in Chapter 17 of the Java Language Specification, the JMM has some serious flaws that allow some unintuitive and undesirable things to happen to reasonable-looking programs. If it is too difficult to write concurrent classes properly, then we are guaranteed that many concurrent classes will not work as expected, and that this is a flaw in the platform. Fortunately, it was possible to create a memory model that was more consistent with most developer's intuition while not breaking any code that was properly synchronized under the old memory model, and the JSR 133 process has done just that. Next month, we'll look at the details of the new memory model (much of which is already built into the 1.4 JDK).

Downloadable resources
Related topics
内容概要:《2024年中国城市低空经济发展指数报告》由36氪研究院发布,指出低空经济作为新质生产力的代表,已成为中国经济新的增长点。报告从发展环境、资金投入、创新能力、基础支撑和发展成效五个维度构建了综合指数评价体系,评估了全国重点城市的低空经济发展状况。北京和深圳在总指数中名列前茅,分别以91.26和84.53的得分领先,展现出强大的资金投入、创新能力和基础支撑。低空经济主要涉及无人机、eVTOL(电动垂直起降飞行器)和直升机等产品,广泛应用于农业、物流、交通、应急救援等领域。政策支持、市场需求和技术进步共同推动了低空经济的快速发展,预计到2026年市场规模将突破万亿元。 适用人群:对低空经济发展感兴趣的政策制定者、投资者、企业和研究人员。 使用场景及目标:①了解低空经济的定义、分类和发展驱动力;②掌握低空经济的主要应用场景和市场规模预测;③评估各城市在低空经济发展中的表现和潜力;④为政策制定、投资决策和企业发展提供参考依据。 其他说明:报告强调了政策监管、产业生态建设和区域融合错位的重要性,提出了加强法律法规建设、人才储备和基础设施建设等建议。低空经济正加速向网络化、智能化、规模化和集聚化方向发展,各地应找准自身比较优势,实现差异化发展。
数据集一个高质量的医学图像数据集,专门用于脑肿瘤的检测和分类研究以下是关于这个数据集的详细介绍:该数据集包含5249张脑部MRI图像,分为训练集和验证集。每张图像都标注了边界框(Bounding Boxes),并按照脑肿瘤的类型分为四个类别:胶质瘤(Glioma)、脑膜瘤(Meningioma)、无肿瘤(No Tumor)和垂体瘤(Pituitary)。这些图像涵盖了不同的MRI扫描角度,包括矢状面、轴面和冠状面,能够全面覆盖脑部解剖结构,为模型训练提供了丰富多样的数据基础。高质量标注:边界框是通过LabelImg工具手动标注的,标注过程严谨,确保了标注的准确性和可靠性。多角度覆盖:图像从不同的MRI扫描角度拍摄,包括矢状面、轴面和冠状面,能够全面覆盖脑部解剖结构。数据清洗与筛选:数据集在创建过程中经过了彻底的清洗,去除了噪声、错误标注和质量不佳的图像,保证了数据的高质量。该数据集非常适合用于训练和验证深度学习模型,以实现脑肿瘤的检测和分类。它为开发医学图像处理中的计算机视觉应用提供了坚实的基础,能够帮助研究人员和开发人员构建更准确、更可靠的脑肿瘤诊断系统。这个数据集为脑肿瘤检测和分类的研究提供了宝贵的资源,能够帮助研究人员开发出更准确、更高效的诊断工具,从而为脑肿瘤患者的早期诊断和治疗规划提供支持。
### 回答1: 这个消息通常表示你的电脑正在尝试修复D盘上的错误或问题。可能是因为D盘上的文件系统损坏或存在其他类型的硬件问题。通常情况下,修复过程可能需要一段时间才能完成,具体时间取决于问题的严重程度和电脑的性能。建议耐心等待修复过程完成,如果问题持续存在,可以考虑联系专业技术人员进行更深入的故障排除。 ### 回答2: 当我们打开电脑时,有时候会出现黑屏提示“Fixing (D:) Stage 1”的警告信息。这是由于计算机在启动时自检修复硬盘中的错误所产生的提示。下面我们来详细解释一下这个问题。 首先,这个问题是由于硬盘分区出现问题所造成的。当系统开机检测到硬盘分区有错误时,便会进行修复。在修复的过程中,系统会对硬盘进行分区扫描并进行修复。因为修复过程是分阶段进行的,因此会显示“Fixing(D:) Stage 1”的提示信息。 那么,我们如何解决这个问题呢?首先,我们需要重启计算机并进入计算机的自动修复程序。方法是:按下F8或F11键,进入安全模式并选择启动修复程序。随后,系统会自动开启硬盘修复过程,修复分区中的错误。如果修复完成后还是出现了“Fixing (D:) Stage 1”的提示,那么我们需要进一步检查硬盘的分区。可以使用系统自带磁盘扫描器进行检查,并进行修复。 最后,我们提醒广大用户,在使用电脑时要注意保护好硬盘。避免病毒攻击以及缓存文件的无限增长等问题,定期对硬盘进行清理和优化,以保证电脑的平稳运行。 ### 回答3: 当电脑启动时出现“fixing (d:)stage1”的提示是因为系统检测到硬盘上的分区出现了一些问题,需要对其进行修复。具体来说,这个提示是在进行硬盘扫描和修复时显示的第一阶段。 在出现这个提示后,你可以等待系统完成修复操作。通常情况下,这个修复过程可能需要一些时间,具体时间取决于你的硬盘大小、文件数量和操作系统版本等因素。在整个修复过程中,你不应该关闭电脑或者中断操作,否则可能会导致数据丢失、操作系统损坏等问题。 如果修复完成后,你发现硬盘上的数据仍然无法正常访问,可以尝试使用第三方工具进行恢复。不过在使用恢复工具之前,你需要向厂商咨询并确认操作方法,否则可能会导致更多的问题。 此外,为了避免出现这种情况,你应该定期对硬盘进行数据备份,并定期进行磁盘扫描和清理,以确保硬盘的健康状态。另外,你应该选择品质可靠的硬盘,以确保其可靠性和使用寿命。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值