zz http://urainday.blogbus.com/logs/62450226.html
Chapter 3. Sharing Objects
@author urainday 2010.4.19
一 篇章结构
3.3. Thread Confinement
This chapter examines techniques for sharing and publishing objects so they can be safely accessed by multiple threads. Together, they lay the foundation for building thread-safe classes and safely structuring concurrent applications using the java.util.concurrent library classes.
We have seen how synchronized blocks and methods can ensure that operations execute atomically, but it is a common misconception that synchronized is only about atomicity or demarcating "critical sections". Synchronization also has another significant, and subtle, aspect: memory visibility . We want not only to prevent one thread from modifying the state of an object when another is using it, but also to ensure that when a thread modifies the state of an object, other threads can actually see the changes that were made. But without synchronization, this may not happen. You can ensure that objects are published safely either by using explicit synchronization or by taking advantage of the synchronization built into library classes.
本章目的: 怎么发布对象,使其能被多线程安全的访问到(讨论的是怎么发布对象,而不是发布线程安全的对象),这其中涉及的一个重要的概念就是可见性( Visibility ) , 怎么去判断一个对象是否安全发布了呢?作者提到了一个原则 : 我在一个线程里修改这个对象的状态,这个最新 状态在其他线程里是可见的,而不是一个 stale data 。同步不仅仅是保持原子性还要保持可见性,本章讨论可见性的问题。怎么去同步呢?有两种方式,一种是明确的同步策略,另一种是利用现成的线程安全类库。
二 读书笔记
1. 如果没有同步策略,在一个线程里更新一个值,这个更新的值是写到了堆内存吗?
这个涉及到了 Java Memory Model ,由于实在不懂,我这里迷糊了好久。我最后根据作者的文章推测出来一个东西,可以解释本章里所有奇怪的可见性问题。
public class NoVisibility {
private static boolean ready;
private static int number;
private static class ReaderThread extends Thread {
public void run() {
while (!ready)
Thread.yield();
System.out.println(number);
}
}
public static void main(String[] args) {
new ReaderThread().start();
number = 42;
ready = true;
}
}
作者举了上面一个很有意思的例子,还是那个词:“经典”。开始一直搞不明白为什么会可能打印出 0 而不是 42 ?为什么可能会一直循环?(在我本机中运行一直都是打出 42 ,但是有打印出 0 或者不断循环的可能性)昨天晚上终于想通了一个解释,我暂时还不知道对不对,因为没有去看过很多 Java Memory Model 的东西。
主线程第一步启动了另外一个线程,这个线程判断 ready 是否为 true, 如果为 true, 退出循环打印 number 。主线程第二步更新 number 的值(由默认值 0 更新至 42 ) , 第三步更新 ready 值(由默认值 false 改为 true )。这里有一个幻觉, number 在 ready 改为 true 以前已经更新至 42 了,打印线程必然打印的是 42 。但是这是错的。首先我们要明确, number , ready 是两个线程所共享的值。
那么在哪里共享这个值?答案:堆内存 。
主线程更改 number 和 ready 是立即更改堆内存里面的值吗?答案:不是,可能是先缓存在寄存器或者缓存里面(据作者所说这是为了提高程序的效率),稍后再 reordering 的更新到内存中。所谓的 reordering 就是更新的顺序不是按你主线程的顺序来的,并不是 number 在前面我就先把 number 写入,后写 ready 。完全有可能是写 ready ,后写 number 。或者都不写,没有同步机制,你不能保证我的值,一定就写到了内存里,或者按照何种顺序写到内存里 , 在主线程里我能从不共享的寄存器和缓存里找到 number=42 , ready=true, 但是另一个线程呢?只能去内存中找,如果没有更新的话,找到的是一个随机的值,可能是以前的 false 、 0,false 、 42 , true,0,true,42 的任何一种,这样的话,我主线程更新了多个值,别的线程不能看到我更新的值,这就是 NoVisibility 。这点很重要,前一章说了 synchronization 包括 synchronized keyword, volatile variables, explicit locks, and atomic variables. 都能保证可见性,进一步说能保证这个值更新到内存里,而不是寄存器和缓存,而且这个更新的顺序是有保证的 ( 书上有 ) 。这样其他线程就能及时的看见这个更新的值,程序就能表现我们预期的行为。
作者说了这么一段话,可以仔细看看。
In general, there is no guarantee that the reading thread will see a value written by another thread on a timely basis, or even at all. In order to ensure visibility of memory writes across threads, you must use synchronization.
不同步的话,一个线程写数据,另一个线程读数据,读出来的可能是一 stale data. Synchronized 既要保证原子性又要保证可见性。详见 Listing 3.1. 同时作者还说 Reasoning about insufficiently synchronized concurrent programs is prohibitively difficult. 所以设计多线程程序的时候一定要小心。怎么去避免这个问题呢?作者给出的解答。
there's an easy way to avoid these complex issues: always use the proper synchronization whenever data is shared across threads.
2. 什么情况下会用到 volatile 关键字 ?
看下面这一段:
The Java Memory Model requires fetch and store operations to be atomic, but for nonvolatile long and double variables, the JVM is permitted to treat a 64-bit read or write as two separate 32-bit operations. If the reads and writes occur in different threads, it is therefore possible to read a nonvolatile long and get back the high 32 bits of one value and the low 32 bits of another. [3] Thus, even if you don't care about stale values, it is not safe to use shared mutable long and double variables in multithreaded programs unless they are declared volatile or guarded by a lock.
Good uses of volatile variables include ensuring the visibility of their own state, that of the object they refer to, or indicating that an important lifecycle event (such as initialization or shutdown) has occurred.
Volatile variables are convenient, but they have limitations. The most common use for volatile variables is as a completion, interruption, or status flag, such as the asleep flag in Listing 3.4 . Volatile variables can be used for other kinds of state information, but more care is required when attempting this. For example, the semantics of volatile are not strong enough to make the increment operation (count++) atomic, unless you can guarantee that the variable is written only from a single thread. (Atomic variables do provide atomic read-modify-write support and can often be used as "better volatile variables"; see Chapter 15 .)
总结下,在并发程序里,对两个 8 字节的变量 long 和 double 一定要用 volatile 确保其在内存中存取的可见性,因为这两种原始变量的存储是由两个 4 字节变量存储合成的。另第二段也看一下, The most common use for volatile variables is as a completion, interruption, or status flag, such as the asleep flag in Listing 3.4 . Volatile variables can be used for other kinds of state information, 。
关于 volatile 的补充
The Java language also provides an alternative, weaker form of synchronization, volatile variables, to ensure that updates to a variable are propagated predictably to other threads. When a field is declared volatile, the compiler and runtime are put on notice that this variable is shared and that operations on it should not be reordered with other memory operations. Volatile variables are not cached in registers or in caches where they are hidden from other processors , so a read of a volatile variable always returns the most recent write by any thread.
Volatile 能保证可见性
缺点: code that relies on volatile variables for visibility of arbitrary state is more fragile and harder to understand than code that uses locking.
作者对 volatile 的总结:
You can use volatile variables only when all the following criteria are met:
Writes to the variable do not depend on its current value, or you can ensure that only a single thread ever updates the value;
The variable does not participate in invariants with other state variables; and
Locking is not required for any other reason while the variable is being accessed.
定义:
Publishing an object means making it available to code outside of its current scope, such as by storing a reference to it where other code can find it , returning it from a nonprivate method, or passing it to a method in another class .
三种情况,见标红部分
Publishing internal state variables can compromise encapsulation and make it more difficult to preserve invariants; publishing objects before they are fully constructed can compromise thread safety.
An object that is published when it should not have been is said to have escaped . ( Escape 的定义)
Once an object escapes, you have to assume that another class or thread may, maliciously or carelessly, misuse it.
作者举例。
7] If someone steals your password and posts it on the alt.free-passwords newsgroup, that information has escaped: whether or not someone has (yet) used those credentials to create mischief, your account has still been compromised. Publishing a reference poses the same sort of risk.
4. 对于前两节我的一些思考和总结:
Java 多线程的目标是在确保线程安全的前提下提升系统的性能,首先要确保的就是线程安全。就像 Chpter2 说的,线程安全的关键是要管理好多线程共享对象的访问问题。怎么去设计一个线程安全的类呢?首先类的核心是他的域,方法仅仅是操纵域的对外接口,而域往往是封装在对象内的(当然你也可以有 public 域,但是这个类是不是线程安全的就不是由我这一个类单独决定的了)。域分两种类型,一种是 primitive filed ,一种是 object field ,对于后一种 object field 又可以分为两小类可变对象和不可变对象。 java 里对于 8 种原始数据类型是按照值传递,而 object 域是引用传递(不知道这样说对不,但是能帮助理解和记忆)。好了,现在我假设我有一个 class A 里面有三种类型的域值, (primitive)int a,(immutable)String b,(mutable)Person c.
Class A{
private int a = 1 ;
private String b = “abc”;
private Person c = new Person();
}
这个类 A 现在是我们很多个线程( Thread1,Thread2… )所共享的数据,就本书前四章看来,作者的意图就是想把线程安全的隐患在共享类(此处是 A )的内部解决 ,而不让问题扩大到整个 Program 要我们去仔细的检查每个线程,这样一个原因是简单,很容易就能确保线程安全,第二点,多线程的问题很复杂,如果去检查整个 program 我不能保证我检查了整个程序之后这个程序就是线程安全的了,可能在某一个角落隐藏着不为我们注意的问题。好了,现在明确目标:把线程安全的隐患在共享类(此处是 A )的内部解决 。 那么又提出了一个新的问题什么样的共享类算线程安全的?在第二章中有了各种释义,简单点说,就是你的类所表现的行为不因在中单线程或者多线程中运行而改变。为此,我们引入了很多的概念, synchronized , volatile , concurrent library 等等都是为这个服务的,这也就是我们确保线程安全的武器,我们要明确武器的利弊,武器的作用。这其中涉及到两个重要的概念:原子性,可见性。这是一个类是线程安全类的基本条件。那么怎么把线程安全的隐患在共享类(此处是 A )的内部解决呢? 这就是 Publication and Escape
历史上的今天:
可见性: