并发编程原理与实战(三十七)高并发利器ConcurrentHashMap核心特性解析

本文来学习并发编程和面试中经常见的ConcurrentHashMap。先通过官方文档对ConcurrentHashMap做个整体上的了解。

高并发的哈希表

引入ConcurrentHashMap主要是为了解决HashMap在多线程环境下的线程安全问题,并克服Hashtable因全局锁导致的性能瓶颈‌。

/**
 * A hash table supporting full concurrency of retrievals and
 * high expected concurrency for updates. This class obeys the
 * same functional specification as {@link java.util.Hashtable}, and
 * includes versions of methods corresponding to each method of
 * {@code Hashtable}. However, even though all operations are
 * thread-safe, retrieval operations do <em>not</em> entail locking,
 * and there is <em>not</em> any support for locking the entire table
 * in a way that prevents all access.  This class is fully
 * interoperable with {@code Hashtable} in programs that rely on its
 * thread safety but not on its synchronization details.
 */

一个支持检索操作完全并发且更新操作具有高预期并发性的哈希表。此类遵循与 java.util.Hashtable相同的功能规范,包含对应于 Hashtable 每个方法的版本。虽然所有操作都是线程安全的但检索操作不涉及锁机制,且不支持锁定整个表以阻止所有访问。在依赖其线程安全性但不依赖同步细节的程序中,此类与Hashtable完全互操作。

/* <p>Retrieval operations (including {@code get}) generally do not
 * block, so may overlap with update operations (including {@code put}
 * and {@code remove}). Retrievals reflect the results of the most
 * recently <em>completed</em> update operations holding upon their
 * onset. (More formally, an update operation for a given key bears a
 * <em>happens-before</em> relation with any (non-null) retrieval for
 * that key reporting the updated value.)  For aggregate operations
 * such as {@code putAll} and {@code clear}, concurrent retrievals may
 * reflect insertion or removal of only some entries.  Similarly,
 * Iterators, Spliterators and Enumerations return elements reflecting the
 * state of the hash table at some point at or since the creation of the
 * iterator/enumeration.  They do <em>not</em> throw {@link
 * java.util.ConcurrentModificationException ConcurrentModificationException}.
 * However, iterators are designed to be used by only one thread at a time.
 * Bear in mind that the results of aggregate status methods including
 * {@code size}, {@code isEmpty}, and {@code containsValue} are typically
 * useful only when a map is not undergoing concurrent updates in other threads.
 * Otherwise the results of these methods reflect transient states
 * that may be adequate for monitoring or estimation purposes, but not
 * for program control.
 */

写先行发生于读

检索操作(包括get)通常不会阻塞,因此可能与更新操作put 和remove重叠。检索操作反映的是在其开始时已完成的最近更新操作的结果。(更正式地说,针对特定键的更新操作与该键的(非空)检索操作之间存在 happens-before 关系,该检索操作会报告更新后的值。)对于putAll 和clear等聚合操作,并发检索可能仅反映部分条目的插入或删除。同样,迭代器(Iterators)、拆分迭代器(Spliterators)和枚举(Enumerations)会返回反映哈希表在迭代器/枚举创建时或之后某个时间点的元素。它们不会抛出 java.util.ConcurrentModificationException 。不过,迭代器设计为仅供单线程使用。需注意,包括 size、isEmpty 和 containsValue在内的聚合状态方法的结果通常仅在哈希表未进行并发更新时才有用。否则,这些方法的结果反映的是可能适用于监控或估计目的但不可用于程序控制的瞬态状态。

/* <p>The table is dynamically expanded when there are too many
 * collisions (i.e., keys that have distinct hash codes but fall into
 * the same slot modulo the table size), with the expected average
 * effect of maintaining roughly two bins per mapping (corresponding
 * to a 0.75 load factor threshold for resizing). There may be much
 * variance around this average as mappings are added and removed, but
 * overall, this maintains a commonly accepted time/space tradeoff for
 * hash tables.  However, resizing this or any other kind of hash
 * table may be a relatively slow operation. When possible, it is a
 * good idea to provide a size estimate as an optional {@code
 * initialCapacity} constructor argument. An additional optional
 * {@code loadFactor} constructor argument provides a further means of
 * customizing initial table capacity by specifying the table density
 * to be used in calculating the amount of space to allocate for the
 * given number of elements.  Also, for compatibility with previous
 * versions of this class, constructors may optionally specify an
 * expected {@code concurrencyLevel} as an additional hint for
 * internal sizing.  Note that using many keys with exactly the same
 * {@code hashCode()} is a sure way to slow down performance of any
 * hash table. To ameliorate impact, when keys are {@link Comparable},
 * this class may use comparison order among keys to help break ties.
 *

碰撞与扩容

当发生过多碰撞(即具有不同哈希码但根据表大小取模后落入同一槽位的键)时,该表会动态扩容,预期平均效果是保持每个映射(特指键Key与值Value之间的关联关系,或者说键值对)约两个存储桶(对应于0.75的扩容负载因子阈值)。随着映射的增删,实际容量可能围绕该平均值存在较大波动,但总体上维持了哈希表普遍接受的时间/空间权衡策略。不过,此类哈希表或其他任何哈希表的扩容操作都可能是相对较慢的操作。若有可能,建议通过可选构造参数initialCapacity 提供容量预估。额外的可选构造参数loadFactor可通过指定表密度,进一步定制初始容量计算方式。此外,为兼容该类旧版本,构造器还可选择性地提供预期concurrencyLevel作为内部容量规划的额外提示。需注意,使用大量具有相同 hashCode()的键必然会显著降低哈希表性能。为缓解此问题,当键实现Comparable 接口时,此类可能利用键间的比较顺序来帮助解决哈希冲突。

键值特性

/* <p>A {@link Set} projection of a ConcurrentHashMap may be created
 * (using {@link #newKeySet()} or {@link #newKeySet(int)}), or viewed
 * (using {@link #keySet(Object)} when only keys are of interest, and the
 * mapped values are (perhaps transiently) not used or all take the
 * same mapping value.
 *
 * <p>A ConcurrentHashMap can be used as a scalable frequency map (a
 * form of histogram or multiset) by using {@link
 * java.util.concurrent.atomic.LongAdder} values and initializing via
 * {@link #computeIfAbsent computeIfAbsent}. For example, to add a count
 * to a {@code ConcurrentHashMap<String,LongAdder> freqs}, you can use
 * {@code freqs.computeIfAbsent(key, k -> new LongAdder()).increment();}
 *
 * <p>This class and its views and iterators implement all of the
 * <em>optional</em> methods of the {@link Map} and {@link Iterator}
 * interfaces.
 *
 * <p>Like {@link Hashtable} but unlike {@link HashMap}, this class
 * does <em>not</em> allow {@code null} to be used as a key or value.
 *
 */

可以通过两种方式创建ConcurrentHashMap的Set视图:使用newKeySet或newKeySet(int)方法进行创建,或通过keySet(Object)方法进行只读访问。当仅需要关注键而无需使用映射值(或所有值都采用相同的映射值)时,这种方式特别适用。

ConcurrentHashMap可通过 java.util.concurrent.atomic.LongAdder值结合computeIfAbsent初始化,实现可伸缩的频次统计(直方图或多集合的一种形式)。例如,要向 ConcurrentHashMap<String,LongAdder> freqs中添加计数,可使用:freqs.computeIfAbsent(key, k -> new LongAdder()).increment();

本类及其视图和迭代器实现了Map和Iterator接口中所有可选的方法。

像Hashtable但不像HashMap,该类不允许有null的键或者值。

批量操作

/* <p>ConcurrentHashMaps support a set of sequential and parallel bulk
 * operations that, unlike most {@link Stream} methods, are designed
 * to be safely, and often sensibly, applied even with maps that are
 * being concurrently updated by other threads; for example, when
 * computing a snapshot summary of the values in a shared registry.
 * There are three kinds of operation, each with four forms, accepting
 * functions with keys, values, entries, and (key, value) pairs as
 * arguments and/or return values. Because the elements of a
 * ConcurrentHashMap are not ordered in any particular way, and may be
 * processed in different orders in different parallel executions, the
 * correctness of supplied functions should not depend on any
 * ordering, or on any other objects or values that may transiently
 * change while computation is in progress; and except for forEach
 * actions, should ideally be side-effect-free. Bulk operations on
 * {@link Map.Entry} objects do not support method {@code setValue}.
 *
 * <ul>
 * <li>forEach: Performs a given action on each element.
 * A variant form applies a given transformation on each element
 * before performing the action.
 *
 * <li>search: Returns the first available non-null result of
 * applying a given function on each element; skipping further
 * search when a result is found.
 *
 * <li>reduce: Accumulates each element.  The supplied reduction
 * function cannot rely on ordering (more formally, it should be
 * both associative and commutative).  There are five variants:
 *
 * <ul>
 *
 * <li>Plain reductions. (There is not a form of this method for
 * (key, value) function arguments since there is no corresponding
 * return type.)
 *
 * <li>Mapped reductions that accumulate the results of a given
 * function applied to each element.
 *
 * <li>Reductions to scalar doubles, longs, and ints, using a
 * given basis value.
 *
 * </ul>
 * </ul>
 */

ConcurrentHashMap支持一系列顺序和并行批量操作,与大多数Stream方法不同,这些操作设计为即使在其他线程正在并发更新映射时也能安全(通常也合理)地应用。例如,在计算共享注册表中值的快照摘要时。共有三种操作类型,每种包含四种形式,接受以键、值、条目和(key, value)对作为参数和/或返回值的函数。由于ConcurrentHashMap的元素没有特定的排序方式,且在不同并行执行中可能以不同顺序处理,因此提供的函数正确性不应依赖于任何排序,也不应依赖于计算过程中可能临时变化的其他对象或值;除了forEach操作外,这些函数最好是无副作用的。对Map.Entry对象的批量操作不支持setValue方法。

  • forEach:对每个元素执行给定的操作。一种变体形式是在执行操作前对每个元素应用给定的转换。
  • search:对每个元素应用给定函数,返回第一个可用的非空结果;找到结果后即停止进一步搜索。
  • reduce:累积每个元素。提供的归约函数不能依赖排序(更正式地说,它应该是既结合又交换的)。共有五种变体:
    • 普通归约。(没有针对(key, value)函数参数的方法形式,因为没有对应的返回类型。)
    • 映射归约:累积对每个元素应用给定函数的结果。
    • 使用给定基值对标量double、long和int进行归约。
/* <p>These bulk operations accept a {@code parallelismThreshold}
 * argument. Methods proceed sequentially if the current map size is
 * estimated to be less than the given threshold. Using a value of
 * {@code Long.MAX_VALUE} suppresses all parallelism.  Using a value
 * of {@code 1} results in maximal parallelism by partitioning into
 * enough subtasks to fully utilize the {@link
 * ForkJoinPool#commonPool()} that is used for all parallel
 * computations. Normally, you would initially choose one of these
 * extreme values, and then measure performance of using in-between
 * values that trade off overhead versus throughput.
 */

这些批量操作接受一个parallelismThreshold参数。如果当前映射大小估计小于给定阈值,方法将顺序执行。使用 Long.MAX_VALUE值将完全抑制并行性,而使用1值将通过分解为足够多的子任务来实现最大并行性,以充分利用用于所有并行计算的 ForkJoinPool#commonPool()。通常,你会先选择这两个极端值之一,然后测量使用中间值时的性能,在开销和吞吐量之间取得平衡。

/* <p>The concurrency properties of bulk operations follow
 * from those of ConcurrentHashMap: Any non-null result returned
 * from {@code get(key)} and related access methods bears a
 * happens-before relation with the associated insertion or
 * update.  The result of any bulk operation reflects the
 * composition of these per-element relations (but is not
 * necessarily atomic with respect to the map as a whole unless it
 * is somehow known to be quiescent).  Conversely, because keys
 * and values in the map are never null, null serves as a reliable
 * atomic indicator of the current lack of any result.  To
 * maintain this property, null serves as an implicit basis for
 * all non-scalar reduction operations. For the double, long, and
 * int versions, the basis should be one that, when combined with
 * any other value, returns that other value (more formally, it
 * should be the identity element for the reduction). Most common
 * reductions have these properties; for example, computing a sum
 * with basis 0 or a minimum with basis MAX_VALUE.
 */

批量操作的并发属性继承自ConcurrentHashMap:从get(key)和相关访问方法返回的任何非空结果都与相关联的插入或更新存在happens-before关系。任何批量操作的结果都反映了这些元素关系的组合(除非映射整体已知是静止的,否则不保证对整个映射的原子性)。相反,由于映射中的键和值永远不会为null,null作为可靠的原子指示器,表示当前没有结果。为了保持这一特性,null作为所有非标量归约操作的隐式基础。对于double、long和int版本,基础值应该满足:当与其他任何值组合时,返回该其他值(更正式地说,它应该是归约的单位元)。大多数常见归约都具有这些特性;例如,使用基础0求和或使用基础MAX_VALUE求最小值。

 /* <p>Search and transformation functions provided as arguments
 * should similarly return null to indicate the lack of any result
 * (in which case it is not used). In the case of mapped
 * reductions, this also enables transformations to serve as
 * filters, returning null (or, in the case of primitive
 * specializations, the identity basis) if the element should not
 * be combined. You can create compound transformations and
 * filterings by composing them yourself under this "null means
 * there is nothing there now" rule before using them in search or
 * reduce operations.
 */

作为参数提供的搜索和转换函数应类似地返回null以表示没有结果(在这种情况下不会被使用)。对于映射归约,这也使得转换函数可以充当过滤器,如果元素不应被组合则返回null(或对于原始特化版本,返回单位元基础值)。你可以根据"null表示当前不存在任何内容"的规则,在搜索或归约操作之前自行组合这些函数来创建复合转换和过滤。

/* <p>Methods accepting and/or returning Entry arguments maintain
 * key-value associations. They may be useful for example when
 * finding the key for the greatest value. Note that "plain" Entry
 * arguments can be supplied using {@code new
 * AbstractMap.SimpleEntry(k,v)}.
 *
 * <p>Bulk operations may complete abruptly, throwing an
 * exception encountered in the application of a supplied
 * function. Bear in mind when handling such exceptions that other
 * concurrently executing functions could also have thrown
 * exceptions, or would have done so if the first exception had
 * not occurred.
 */

接受和/或返回Entry参数的方法保持键值关联。例如,在查找最大值的键时可能很有用。注意,"普通"Entry参数可以通过new AbstractMap.SimpleEntry(k,v)提供。

批量操作可能会突然完成,抛出在应用提供的函数时遇到的异常。处理此类异常时需注意,其他并发执行的函数也可能抛出异常,或者如果第一个异常没有发生,它们本也会抛出异常。

/* <p>Speedups for parallel compared to sequential forms are common
 * but not guaranteed.  Parallel operations involving brief functions
 * on small maps may execute more slowly than sequential forms if the
 * underlying work to parallelize the computation is more expensive
 * than the computation itself.  Similarly, parallelization may not
 * lead to much actual parallelism if all processors are busy
 * performing unrelated tasks.
 *
 * <p>All arguments to all task methods must be non-null.
 *
 * <p>This class is a member of the
 * <a href=" ">
 * Java Collections Framework</a >.
 */

与顺序形式相比,并行形式的加速是常见的但不保证。如果并行化计算的基础工作比计算本身更昂贵,那么在小映射上涉及简短函数的并行操作可能比顺序形式执行得更慢。类似地,如果所有处理器都在执行不相关的任务,并行化可能不会产生太多实际的并行性。

所有任务方法的所有参数都必须非null。

此类是Java集合框架的成员。

总结

1、引入ConcurrentHashMap主要是为了解决HashMap在多线程环境下的线程安全问题,并克服Hashtable因全局锁导致的性能瓶颈‌。

2、一个支持检索操作完全并发且更新操作具有高预期并发性的哈希表。所有操作都是线程安全的,但检索操作不涉及锁机制,且不支持锁定整个表以阻止所有访问。

3、检索操作反映的是在其开始时已完成的最近更新操作的结果。

4、当发生过多碰撞(即具有不同哈希码但根据表大小取模后落入同一槽位的键)时,哈希表会动态扩容。

5、像Hashtable但不像HashMap,ConcurrentHashMap不允许有null的键或者值。

6、ConcurrentHashMap 支持‌三类安全并行批量操作‌,每类包含四种形式(处理 Key/Value/Entry/(Key,Value))。

  • 批量操作分类与形式

forEach‌:对所有元素执行无副作用的遍历操作(如统计、打印)。
search‌:查找满足条件的元素(如按条件过滤)。
reduce‌:对所有元素执行聚合操作(如求和、拼接)。

  • 每类操作均支持 4 种处理维度:

仅处理 Key;
仅处理 Value;
处理 Entry(键值对);
同时处理 (Key,Value)。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

帧栈

您的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值