Java theory and practice: Hashing it out

本文围绕Java中equals()和hashCode()方法展开。介绍了定义相等性的相关内容,通过Integer类示例说明重写方法的情况,阐述重写原因、实现要求,还指出Java类库中哈希和相等性实现存在的问题,如哈希范围小等,最后总结定义方法可提升类在哈希集合中的可用性。
Defining hashCode() and equals() effectively and correctly

Level: Introductory

Brian Goetz (brian@quiotix.com)
Principal Consultant, Quiotix Corp
27 May 2003

Column icon Every Java object has a hashCode() and an equals() method. Many classes override the default implementations of these methods to provide a higher degree of semantic comparability between object instances. In this installment of Java theory and practice, Java developer Brian Goetz shows you the rules and guidelines you should follow when creating Java classes in order to define hashCode() and equals() effectively and appropriately. Share your thoughts on this article with the author and other readers in the accompanying discussion forum. (You can also click Discuss at the top or bottom of the article to access the forum.)

While the Java language does not provide direct support for associative arrays -- arrays that can take any object as an index -- the presence of the hashCode() method in the root Object class clearly anticipates the ubiquitous use of HashMap (and its predecessor, Hashtable). Under ideal conditions, hash-based containers offer both efficient insertion and efficient retrieval; supporting hashing directly in the object model facilitates the development and use of hash-based containers.

Defining equality
The Object class has two methods for making inferences about an object's identity: equals() and hashCode(). In general, if you override one of these methods, you must override both, as there are important relationships between them that must be maintained. In particular, if two objects are equal according to the equals() method, they must have the same hashCode() value (although the reverse is not generally true).

The semantics of equals() for a given class are left to the implementer; defining what equals() means for a given class is part of the design work for that class. The default implementation, provided by Object, is simply reference equality:


  public boolean equals(Object obj) { 
    return (this == obj); 
  }

Under this default implementation, two references are equal only if they refer to the exact same object. Similarly, the default implementation of hashCode() provided by Object is derived by mapping the memory address of the object to an integer value. Because on some architectures the address space is larger than the range of values for int, it is possible that two distinct objects could have the same hashCode(). If you override hashCode(), you can still use the System.identityHashCode() method to access this default value.

Overriding equals() -- a simple example
An identity-based implementation for equals() and hashCode() is a sensible default, but for some classes, it is desirable to relax the definition of equality somewhat. For example, the Integer class defines equals() similarly to this:


  public boolean equals(Object obj) {
    return (obj instanceof Integer 
            && intValue() == ((Integer) obj).intValue());
  }

Under this definition, two Integer objects are equal only if they contain the same integer value. This, along with Integer being immutable, makes it practical to use an Integer as a key in a HashMap. This value-based approach to equality is used by all the primitive wrapper classes in the Java class library, such as Integer, Float, Character, and Boolean, as well as String (two String objects are equal if they contain the same sequence of characters). Because these classes are immutable and implement hashCode() and equals() sensibly, they all make good hash keys.

Why override equals() and hashCode()?
What would happen if Integer did not override equals() and hashCode()? Nothing, if we never used an Integer as a key in a HashMap or other hash-based collection. However, if we were to use such an Integer object for a key in a HashMap, we would not be able to reliably retrieve the associated value, unless we used the exact same Integer instance in the get() call as we did in the put() call. This would require ensuring that we only use a single instance of the Integer object corresponding to a particular integer value throughout our program. Needless to say, this approach would be inconvenient and error prone.

The interface contract for Object requires that if two objects are equal according to equals(), then they must have the same hashCode() value. Why does our root object class need hashCode(), when its discriminating ability is entirely subsumed by that of equals()? The hashCode() method exists purely for efficiency. The Java platform architects anticipated the importance of hash-based collection classes -- such as Hashtable, HashMap, and HashSet -- in typical Java applications, and comparing against many objects with equals() can be computationally expensive. Having every Java object support hashCode() allows for efficient storage and retrieval using hash-based collections.

Requirements for implementing equals() and hashCode()
There are some restrictions placed on the behavior of equals() and hashCode(), which are enumerated in the documentation for Object. In particular, the equals() method must exhibit the following properties:

  • Symmetry: For two references, a and b, a.equals(b) if and only if b.equals(a)
  • Reflexivity: For all non-null references, a.equals(a)
  • Transitivity: If a.equals(b) and b.equals(c), then a.equals(c)
  • Consistency with hashCode(): Two equal objects must have the same hashCode() value

The specification for Object offers a vague guideline that equals() and hashCode() be consistent -- that their results will be the same for subsequent invocations, provided that "no information used in equals comparison on the object is modified." This sounds sort of like "the result of the calculation shouldn't change, unless it does." This vague statement is generally interpreted to mean that equality and hash value calculations should be a deterministic function of an object's state and nothing else.

What should equality mean?
The requirements for equals() and hashCode() imposed by the Object class specification are fairly simple to follow. Deciding whether, and how, to override equals() requires a little more judgment. In the case of simple immutable value classes, such as Integer (and in fact for nearly all immutable classes), the choice is fairly obvious -- equality should be based on the equality of the underlying object state. In the case of Integer, the object's only state is the underlying integer value.

For mutable objects, the answer is not always so clear. Should equals() and hashCode() be based on the object's identity (like the default implementation) or the object's state (like Integer and String)? There's no easy answer -- it depends on the intended use of the class. For containers like List and Map, one could have made a reasonable argument either way. Most classes in the Java class library, including container classes, err on the side of providing an equals() and hashCode() implementation based on the object state.

If an object's hashCode() value can change based on its state, then we must be careful when using such objects as keys in hash-based collections to ensure that we don't allow their state to change when they are being used as hash keys. All hash-based collections assume that an object's hash value does not change while it is in use as a key in the collection. If a key's hash code were to change while it was in a collection, some unpredictable and confusing consequences could follow. This is usually not a problem in practice -- it is not common practice to use a mutable object like a List as a key in a HashMap.

An example of a simple mutable class that defines equals() and hashCode() based on its state is Point. Two Point objects are equal if they refer to the same (x, y) coordinates, and the hash value of a Point is derived from the IEEE 754-bit representation of the x and y coordinate values.

For more complex classes, the behavior of equals() and hashCode() may even be imposed by the specification of a superclass or interface. For example, the List interface requires that a List object is equal to another object if and only if the other object is also a List and they contain the same elements (defined by Object.equals() on the elements) in the same order. The requirements for hashCode() are defined with even more specificity -- the hashCode() value of a list must conform to the following calculation:


  hashCode = 1;
  Iterator i = list.iterator();
  while (i.hasNext()) {
      Object obj = i.next();
      hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
  }

Not only is the hash value dependent on the contents of the list, but the specific algorithm for combining the hash values of the individual elements is specified as well. (The String class specifies a similar algorithm to be used for computing the hash value of a String.)

Writing your own equals() and hashCode() methods
Overriding the default equals() method is fairly easy, but overriding an already overridden equals() method can be extremely tricky to do without violating either the symmetry or transitivity requirement. When overriding equals(), you should always include some Javadoc comments on equals() to help those who might want to extend your class do so correctly.

As a simple example, consider the following class:


  class A {
    final B someNonNullField;
    C someOtherField;
    int someNonStateField;
  }

How would we write the equals() method for this class? This way is suitable for many situations:


  public boolean equals(Object other) {
    // Not strictly necessary, but often a good optimization
    if (this == other)
      return true;
    if (!(other instanceof A))
      return false;
    A otherA = (A) other;
    return 
      (someNonNullField.equals(otherA.someNonNullField))
        && ((someOtherField == null) 
            ? otherA.someOtherField == null 
            : someOtherField.equals(otherA.someOtherField)));
  }

Now that we've defined equals(), we have to define hashCode() in a compatible manner. One compatible, but not all that useful, way to define hashCode() is like this:


  public int hashCode() { return 0; }

This approach will yield horrible performance for HashMaps with a large number of entries, but it does conform to the specification. A more sensible implementation of hashCode() for A would be like this:


  public int hashCode() { 
    int hash = 1;
    hash = hash * 31 + someNonNullField.hashCode();
    hash = hash * 31 
                + (someOtherField == null ? 0 : someOtherField.hashCode());
    return hash;
  }

Note that both of these implementations delegate a portion of the computation to the equals() or hashCode() method of the state fields of the class. Depending on your class, you may also want to delegate part of the computation to the equals() or hashCode() function of the superclass. For primitive fields, there are helper functions in the associated wrapper classes that can help in creating hash values, such as Float.floatToIntBits.

Writing an equals() method is not without pitfalls. In general, it is impractical to cleanly override equals() when extending an instantiable class that itself overrides equals(), and writing an equals() method that is intended to be overridden (such as in an abstract class) is done differently than writing an equals() method for a concrete class. See Effective Java Programming Language Guide, Item 7 (in Resources) for some examples and more details about why this is so.

Room for improvement?
Building hashing into the root object class of the Java class library was a very sensible design compromise -- it makes using hash-based containers so much easier and more efficient. However, several criticisms have been made of the approach to and implementation of hashing and equality in the Java class library. The hash-based containers in java.util are very convenient and easy to use, but may not be suitable for applications that require very high performance. While most of these will never be changed, it is worthwhile to keep in mind when you're designing applications that rely heavily on the efficiency of hash-based containers. These criticisms include:

  • Too small a hash range. Using int, instead of long, for the return type of hashCode() increases the possibility of hash collisions.

  • Bad distribution of hash values. The hash values for short strings and small integers are themselves small integers, and are close to the hash values of other "nearby" objects. A more well-behaved hash function would distribute the hash values more evenly across the hash range.

  • No defined hashing operations. While some classes, such as String and List, define a hash algorithm to be used in combining the hash values of its constituent elements into a single hash value, the language specification does not define any approved means of combining the hash values of multiple objects into a new hash value. The trick used by List, String, or the example class A discussed earlier in Writing your own equals() and hashCode() methods are simple, but far from mathematically ideal. Nor does the class library offer convenience implementations of any hashing algorithm that would simplify the creation of more sophisticated hashCode() implementations.

  • Difficulty writing equals() when extending an instantiable class that already overrides equals(). The "obvious" ways to define equals() when extending an instantiable class that already overrides equals() all fail to meet the symmetry or transitivity requirements of the equals() method. This means that you must understand the structure and implementation details of classes you are extending when overriding equals(), and may even need to expose private fields in the base class as protected to do so, which violates principles of good object-oriented design.

Summary
By defining equals() and hashCode() consistently, you can improve the usability of your classes as keys in hash-based collections. There are two approaches to defining equality and hash value: identity-based, which is the default provided by Object, and state-based, which requires overriding both equals() and hashCode(). If an object's hash value can change when its state changes, be sure you don't allow its state to change while it is being used as a hash key.

Resources

About the author
Brian Goetz has been a professional software developer for the past 15 years. He is a Principal Consultant at Quiotix, a software development and consulting firm located in Los Altos, California. See Brian's published and upcoming articles in popular industry publications. Contact Brian at brian@quiotix.com.
### Java中 `java.lang.NoSuchFieldError: HASHING_STRATEGY` 错误的原因 该错误通常发生在运行时,当程序尝试访问某个类中的字段,而该字段不存在于当前加载的类定义中时会抛出此异常。这种问题可能由以下几个方面引起: 1. **不兼容的库版本** 如果项目依赖的不同库之间存在版本冲突,可能会导致某些字段在编译时可用但在运行时不可用的情况。例如,在不同版本的JDK或第三方库中,可能存在字段被移除或者重命名的现象[^1]。 2. **字节码操作工具的影响** 使用像 ASM 或 Javassist 这样的字节码操作框架修改类文件时,如果未正确处理目标类结构,则可能导致此类错误发生。特别是当试图访问已被删除或从未存在的成员变量时[^3]。 3. **插件配置不当** 对于集成开发环境(IDE),如 IntelliJ IDEA 中启用特定功能(比如 Kotlin 支持),如果没有正确设置其所需组件及其对应版本号也可能引发类似的错误消息。例如提到过的关于 Kotlin 的案例就是由于指定了一个低于最小支持标准的插件版本所造成的后果[^4]。 --- ### 解决方案 针对上述几种可能性提供相应的解决办法如下所示: #### 方法一:检查并统一使用的库/SDK 版本 确保整个项目的构建路径下所有的 jar 文件都来自同一个稳定的发布版次;尤其是要注意 JDK 自身所提供的核心 API 是否一致以及外部引入的服务端框架是否有更新日志说明哪些内部实现细节发生了变化[^2]。 可以通过以下方式验证是否存在多个相同名称却彼此独立的对象实例化情况: ```bash mvn dependency:tree -Dverbose=true | grep "conflict" ``` 对于 Gradle 用户来说可以执行命令来查看详细的解析过程: ```groovy gradlew dependencies --configuration runtimeClasspath > output.txt && less output.txt ``` #### 方法二:重新编译受影响的部分源代码 有时候即使所有可见级别的依赖关系看起来都没有问题,但由于缓存机制或者其他未知因素影响到了最终产物的质量,因此建议清理工作区后再试一次完整的打包流程看看效果如何改善。 ```bash rm -rf ~/.m2/repository/com/example/* mvn clean install ``` 如果是 Eclipse 平台上的工程则需手动刷新资源视图并且触发增量式重建动作; 而在 Android Studio 上面除了同步 gradle 外还需要勾选选项菜单里的“Make Project”。 #### 方法三:调整相关插件参数设定 假如确认是由某种扩展模块引起的麻烦的话,请参照官方文档仔细核对其初始化阶段所需的各项属性值是否匹配实际需求场景下的预期范围之外还有无遗漏之处待补充完善上去才行哦! 以 JetBrains 家族产品为例演示具体步骤如下: - 打开 Settings -> Build, Execution, Deployment -> Compiler -> Annotation Processors 页面找到 Enable annotation processing 开关将其打开即可开启注解处理器服务。 - 同样位置继续向下滚动至 Kotlin 部分定位到 Use kotlinc from 下拉框选择 Embedded version 来强制采用内置默认解释引擎而非自定义安装包形式替代之。 最后记得保存更改之后再次测试一遍看能否恢复正常运转状态咯~ --- ### 总结 综上所述,“`java.lang.NoSuchFieldError: HASHING_STRATEGY`”主要是因为运行环境中缺少期望调用的那个静态常量定义所致。要彻底根治这类顽疾就需要从业务逻辑层面深入挖掘潜在隐患所在,并采取针对性措施逐一排除干扰项直至达成理想成效为止。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值