The primary design goal of this hash table is to maintain concurrent readability (typically method get(), but also iterators and related methods) while minimizing update contention. Secondary goals are to keep space consumption about the same or better than java.util.HashMap, and to support high initial insertion rates on an empty table by many threads. 这个哈希表的主要设计目标是在最小化更新争用的同时保持并发可读性(通常是方法get(),但也包括迭代器和相关方法)。次要目标是保持与java.util相同或更好的空间消耗。HashMap,以及多个线程在空表上支持高初始插入率。
This map usually acts as a binned (bucketed) hash table. Each key-value mapping is held in a Node. Most nodes are instances of the basic Node class with hash, key, value, and next fields. However, various subclasses exist: TreeNodes are arranged in balanced trees, not lists. TreeBins hold the roots of sets of TreeNodes. ForwardingNodes are placed at the heads of bins during resizing. ReservationNodes are used as placeholders while establishing values in computeIfAbsent and related methods. The types TreeBin, ForwardingNode, and ReservationNode do not hold normal user keys, values, or hashes, and are readily distinguishable during search etc because they have negative hash fields and null key and value fields. (These special nodes are either uncommon or transient,so the impact of carrying around some unused fields is insignificant.)
这个映射通常作为一个binned (bucket)哈希表。每个键-值映射都保存在一个Node中。大多数节点都是具有散列、键、值和next字段的基本Node类的实例。但是,存在各种子类:treenode被安排在平衡树中,而不是列表中。treebin持有TreeNodes集合的根。在调整大小期间,forwardingnode被放置在箱子的头部。在computeIfAbsent和相关方法中建立值时,reservationnode被用作占位符。TreeBin、ForwardingNode和ReservationNode类型不保存普通用户键、值或哈希值
The table is lazily initialized to a power-of-two size upon the first insertion.Each bin in the table normally contains a list of Nodes (most often, the list has only zero or one Node). Table accesses require volatile/atomic reads, writes, and CASes. Because there is no other way to arrange this without adding further indirections, we use intrinsics (sun.misc.Unsafe) operations.
表在第一次插入时被延迟初始化为2的幂次大小。表中的每个bin通常包含一个Node列表(通常,这个列表只有0个或一个Node)。
表访问需要volatile/原子读、写和case。因为在不添加进一步间接的情况下,没有其他方式来安排这一点,所以我们使用intrinsic (sun.misc.Unsafe)操作。
We use the top (sign) bit of Node hash fields for control purposes -- it is available anyway because of addressing constraints.Nodes with negative hash fields are specially handled or ignored in map methods.
我们将Node哈希字段的顶部(符号)位用于控制目的——由于寻址限制,它无论如何都是可用的。带有负哈希字段的节点在map方法中会被特殊处理或忽略。
Insertion (via put or its variants) of the first node in an empty bin is performed by just CASing it to the bin. This is by far the most common case for put operations under most key/hash distributions. Other update operations (insert,delete, and replace) require locks. We do not want to waste the space required to associate a distinct lock object with each bin, so instead use the first node of a bin list itself as a lock. Locking support for these locks relies on builtin "synchronized" monitors.
在空容器中插入(通过put或其变体)第一个节点,只需将其放入容器即可。这是在大多数键/哈希分布下进行put操作的最常见情况。其他更新操作(插入、删除和替换)需要加锁。我们不希望浪费将一个不同的锁对象与每个bin关联所需的空间,所以使用bin列表本身的第一个节点作为锁。这些锁的锁定支持依赖于内置的“同步”监视器。
Using the first node of a list as a lock does not by itself suffice though: When a node is locked, any update must first validate that it is still the first node after locking it, and retry if not. Because new nodes are always appended to lists,once a node is first in a bin, it remains first until deleted or the bin becomes invalidated (upon resizing).
使用列表的第一个节点作为锁本身是不够的:当一个节点被锁定时,任何更新都必须首先验证它仍然是锁定后的第一个节点,如果不是,则重试。因为新的节点总是被添加到列表中,所以一旦一个节点在一个bin中是第一个节点,它就一直是第一个节点,直到删除或bin失效(调整大小时)。
The main disadvantage of per-bin locks is that other update operations on other nodes in a bin list protected by the same lock can stall, for example when user equals() or mapping functions take a long time. However, statistically, under random hash codes, this is not a common problem. Ideally, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average, given the resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are:
0: 0.60653066 1: 0.30326533 2: 0.07581633 3: 0.01263606 4: 0.00157952 5: 0.00015795 6: 0.00001316 7: 0.00000094 8: 0.00000006
more: less than 1 in ten million
每个bin锁的主要缺点是,在同一个锁保护的bin列表中的其他节点上的其他更新操作可能会停止,例如当user equals()或映射函数花费很长时间时。然而,在统计学上,在随机哈希码下,这不是一个常见的问题。理想情况下,容器中节点的频率遵循泊松分布(http://en.wikipedia.org/wiki/Poisson_distribution),平均参数约为0.5,假设调整阈值为0.75,尽管由于调整粒度而存在很大的差异。忽略方差,期望出现
Lock contention probability for two threads accessing distinct elements is roughly 1 / (8 * #elements) under random hashes.
在随机哈希下,两个线程访问不同元素的锁争用概率大约是1 /(8 * #元素)。
Actual hash code distributions encountered in practice sometimes deviate significantly from uniform randomness.This includes the case when N > (1<<30), so some keys MUST collide.Similarly for dumb or hostile usages in which multiple keys are designed to have identical hash codes or ones that differs only in masked-out high bits. So we use a secondary strategy that applies when the number of nodes in a bin exceeds a threshold. These TreeBins use a balanced tree to hold nodes (a specialized form of red-black trees), bounding search time to O(log N). Each search step in a TreeBin is at least twice as slow as in a regular list, but given that N cannot exceed (1<<64) (before running out of addresses) this bounds search steps, lock hold times, etc, to reasonable constants (roughly 100 nodes inspected per operation worst case) so long as keys are Comparable (which is very common -- String, Long, etc).TreeBin nodes (TreeNodes) also maintain the same "next" traversal pointers as regular nodes, so can be traversed in iterators in the same way.
在实践中遇到的实际哈希代码分布有时会显著地偏离均匀随机性。这包括当N >(1<<30)的情况,所以一些键必须碰撞。类似地,在dumb或敌对的用法中,多个键被设计成具有相同的哈希码,或仅在被屏蔽的高位上不同。因此,当容器中的节点数量超过阈值时,我们使用二级策略。这些TreeBin使用一个平衡树来保存节点(红黑树的一种特殊形式),将搜索时间限制在O(log N)。TreeBin中的每个搜索步骤至少是两个步骤
The table is resized when occupancy exceeds a percentage threshold (nominally, 0.75, but see below). Any thread noticing an overfull bin may assist in resizing after the initiating thread allocates and sets up the replacement array.However, rather than stalling, these other threads may proceed with insertions etc. The use of TreeBins shields us from the worst case effects of overfilling while resizes are in progress. Resizing proceeds by transferring bins, one by one, from the table to the next table. However, threads claim small blocks of indices to transfer (via field transferIndex) before doing so, reducing contention. A generation stamp in field sizeCtl ensures that resizings do not overlap. Because we are using power-of-two expansion, the elements from each bin must either stay at same index, or move with a power of two offset. We eliminate unnecessary node creation by catching cases where old nodes can be reused because their next fields won't change. On average, only about one-sixth of them need cloning when a table doubles. The nodes they replace will be garbage collectable as soon as they are no longer referenced by any reader thread that may be in the midst of concurrently traversing table. Upon transfer, the old table bin contains only a special forwarding node (with hash field "MOVED") that contains the next table as its key. On encountering a forwarding node, access and update operations restart, using the new table.
当占用率超过一个百分比阈值(名义上是0.75,但见下面)时,会调整表的大小。在初始化的线程分配和设置替换数组后,任何注意到容器满的线程都可以帮助调整大小。然而,这些其他线程可能会继续执行插入等操作,而不是暂停。treebin的使用可以让我们避免在调整大小时出现最坏的情况,即过度填充。通过一个接一个地将箱子从一个表转移到下一个表来调整大小。然而,线程声明了一小块索引来传输(通过字段transferIndex)
Each bin transfer requires its bin lock, which can stall waiting for locks while resizing. However, because other threads can join in and help resize rather than contend for locks, average aggregate waits become shorter as resizing progresses. The transfer operation must also ensure that all accessible bins in both the old and new table are usable by any traversal. This is arranged in part by proceeding from the last bin (table.length - 1) up towards the first. Upon seeing a forwarding node, traversals (see class Traverser) arrange to move to the new table without revisiting nodes. To ensure that no intervening nodes are skipped even when moved out of order,a stack (see class TableStack) is created on first encounter of a forwarding node during a traversal, to maintain its place if later processing the current table. The need for these save/restore mechanics is relatively rare, but when one forwarding node is encountered, typically many more will be.So Traversers use a simple caching scheme to avoid creating so many new TableStack nodes. (Thanks to Peter Levart for suggesting use of a stack here.)
每个收纳传输都需要它的收纳锁,在调整大小时等待锁可能会暂停。但是,由于其他线程可以加入并帮助调整锁的大小,而不是争用锁,因此随着调整的进行,平均聚合等待时间会变短。传输操作还必须确保新旧表中所有可访问的容器在任何遍历中都是可用的。这部分是通过从最后一个箱子(表)开始。长度- 1)向上靠近第一个。在看到转发节点时,遍历(请参阅类遍历器)安排移动到新表,而不需要重新访问节点。对ensu
The traversal scheme also applies to partial traversals of ranges of bins (via an alternate Traverser constructor)to support partitioned aggregate operations. Also, read-only operations give up if ever forwarded to a null table, which provides support for shutdown-style clearing, which is also not currently implemented.
遍历模式还适用于对容器范围的部分遍历(通过备用的遍历构造函数),以支持分区聚合操作。此外,只读操作如果被转发到空表,就会放弃,这提供了对停机式清理的支持,而目前也没有实现这种清理。
Lazy table initialization minimizes footprint until first use,and also avoids resizings when the first operation is from a putAll, constructor with map argument, or deserialization.These cases attempt to override the initial capacity settings,but harmlessly fail to take effect in cases of races.
延迟表初始化可以在第一次使用之前最小化内存占用,并且当第一次操作来自putAll、带有map参数的构造函数或反序列化时也可以避免调整大小。这些情况试图推翻最初的容量设置,但无害地未能在种族的情况下生效。
The element count is maintained using a specialization of LongAdder. We need to incorporate a specialization rather than just use a LongAdder in order to access implicit contention-sensing that leads to creation of multiple CounterCells. The counter mechanics avoid contention on updates but can encounter cache thrashing if read too frequently during concurrent access. To avoid reading so often,resizing under contention is attempted only upon adding to a bin already holding two or more nodes. Under uniform hash distributions, the probability of this occurring at threshold is around 13%, meaning that only about 1 in 8 puts check threshold (and after resizing, many fewer do so).
使用LongAdder的专门化来维护元素计数。我们需要合并一个专门化,而不是仅仅使用LongAdder来访问导致创建多个countercell的隐式争用感知。计数器机制避免了更新时的争用,但如果在并发访问期间读得太频繁,可能会遇到缓存抖动。为了避免读取如此频繁,只有在添加到已经容纳两个或更多节点的bin时,才会尝试在争用下调整大小。在均匀哈希分布下,在阈值处出现这种情况的概率约为13%,也就是说
TreeBins use a special form of comparison for search and related operations (which is the main reason we cannot use existing collections such as TreeMaps). TreeBins contain Comparable elements, but may contain others, as well as elements that are Comparable but not necessarily Comparable for the same T, so we cannot invoke compareTo among them. To handle this, the tree is ordered primarily by hash value, then by Comparable.compareTo order if applicable. On lookup at a node,if elements are not comparable or compare as 0 then both left and right children may need to be searched in the case of tied hash values. (This corresponds to the full list search that would be necessary if all elements were non-Comparable and had tied hashes.) On insertion, to keep a total ordering (or as close as is required here) across rebalancings, we compare classes and identityHashCodes as tie-breakers. The red-black balancing code is updated from pre-jdk-collections (http://gee.cs.oswego.edu/dl/classes/collections/RBCell.java) based in turn on Cormen, Leiserson, and Rivest "Introduction to Algorithms" (CLR).
treebin使用一种特殊形式的比较来进行搜索和相关操作(这是我们不能使用TreeMaps等现有集合的主要原因)。treebin包含Comparable元素,但也可能包含其他元素,以及具有Comparable但不一定对相同T具有Comparable的元素,因此不能在它们之间调用compareTo。为了处理这个问题,树首先按哈希值排序,然后按Comparable.compareTo排序(如果适用的话)。在查找节点时,如果元素不具有可比性或与0比较,则在t的情况下可能需要搜索左右子节点
TreeBins also require an additional locking mechanism. While list traversal is always possible by readers even during updates, tree traversal is not, mainly because of tree-rotations that may change the root node and/or its linkages. TreeBins include a simple read-write lock mechanism parasitic on the main bin-synchronization strategy: Structural adjustments associated with an insertion or removal are already bin-locked(and so cannot conflict with other writers) but must wait for ongoing readers to finish. Since there can be only one such waiter, we use a simple scheme using a single "waiter" field to block writers. However, readers need never block. If the root lock is held, they proceed along the slow traversal path (via next-pointers) until the lock becomes available or the list is exhausted, whichever comes first. These cases are not fast, but maximize aggregate expected throughput.
treebin还需要额外的锁定机制。即使在更新期间,读取器也总是可以遍历列表,但树遍历却不可能,这主要是因为树的旋转可能会改变根节点和/或其链接。treebin包含一个简单的读写锁机制,寄生在主bin同步策略上:与插入或删除相关的结构调整已经被bin锁定(因此不能与其他写入器冲突),但必须等待正在进行的读取器完成。因为只能有一个这样的服务员,所以我们使用一个简单的方案,使用一个“服务员”
Maintaining API and serialization compatibility with previous versions of this class introduces several oddities. Mainly: We leave untouched but unused constructor arguments refering to concurrencyLevel. We accept a loadFactor constructor argument,but apply it only to initial table capacity (which is the only time that we can guarantee to honor it.) We also declare an unused "Segment" class that is instantiated in minimal form only when serializing.
维护与该类以前版本的API和序列化兼容性带来了几个古怪之处。主要:我们保留引用concurrencyLevel的构造函数参数。我们接受loadFactor构造函数参数,但只将其应用于初始表容量(这是我们能够保证遵守它的唯一时间)。我们还声明了一个未使用的“Segment”类,它只在序列化时以最小的形式实例化。
Also, solely for compatibility with previous versions of this class, it extends AbstractMap, even though all of its methods are overridden, so it is just useless baggage. This file is organized to make things a little easier to follow while reading than they might otherwise: First the main static declarations and utilities, then fields, then main public methods (with a few factorings of multiple public methods into internal ones), then sizing methods, trees, traversers, and bulk operations.
而且,仅仅为了与该类的以前版本兼容,它扩展了AbstractMap,即使它的所有方法都被重写了,所以它只是无用的包袱。
这个文件的组织使阅读时更容易理解:首先是主要的静态声明和实用程序,然后是字段,然后是主要的公共方法(将多个公共方法分解为内部方法),然后是大小方法、树、遍历器和批量操作。