JAVA基础知识之Set集合

最新推荐文章于 2025-03-11 16:40:47 发布

weixin_34040079

最新推荐文章于 2025-03-11 16:40:47 发布

阅读量62

点赞数

CC 4.0 BY-SA版权

文章标签： java 数据结构与算法

原文链接：https://my.oschina.net/wugong/blog/1791010

本文详细介绍了 Java 中 Set 集合的各种实现类，包括 HashSet、LinkedHashSet、TreeSet 和 EnumSet 的特点及使用场景。探讨了 HashSet 的元素判断逻辑、LinkedHashSet 维护插入顺序的机制、TreeSet 的排序方式以及 EnumSet 的高效性。

Set集合的基本特征是不记录添加顺序，不允许元素重复（想想是为什么）。最常用的实现类是HashSet.

本文将要介绍以下内容

HashSet类

HashSe的特征
HashSet的equals和hashCode

LinkedHashSet的特征
TreeSet的特征
EnumSet的特征

HashSet类

HashSet类直接实现了Set接口，其底层其实是包装了一个HashMap去实现的。HashSet采用HashCode算法来存取集合中的元素，因此具有比较好的读取和查找性能。

HashSet的特征

不仅不能保证元素插入的顺序，而且在元素在以后的顺序中也可能变化（这是由HashSet按HashCode存储对象（元素）决定的，对象变化则可能导致HashCode变化）
HashSet是线程非安全的
HashSet元素值可以为NULL

HashSet的equals和HashCode

前面说过，Set集合是不允许重复元素的，否则将会引发各种奇怪的问题。那么HashSet如何判断元素重复呢？

HashSet需要同时通过equals和HashCode来判断两个元素是否相等，具体规则是，如果两个元素通过equals为true，并且两个元素的hashCode相等，则这两个元素相等（即重复）。

所以如果要重写保存在HashSet中的对象的equals方法，也要重写hashCode方法，重写前后hashCode返回的结果相等（即保证保存在同一个位置）。所有参与计算 hashCode() 返回值的关键属性，都应该用于作为 equals() 比较的标准。

试想如果重写了equals方法但不重写hashCode方法，即相同equals结果的两个对象将会被HashSet当作两个元素保存起来，这与我们设计HashSet的初衷不符（元素不重复）。

另外如果两个元素哈市Code相等但equals结果不为true，HashSet会将这两个元素保存在同一个位置，并将超过一个的元素以链表方式保存，这将影响HashSet的效率。

如果重写了equals方法但没有重写hashCode方法，则HashSet可能无法正常工作，比如下面的例子。

package colection.HashSet;
 
import java.util.HashSet;
import java.util.Iterator;
 
public class R {
        public int count;
        public R(int count) {
               this.count = count;
        }
       
        public String toString() {
               return "R[count:" + count +" # hashCode:"+this.hashCode()+"]";
        }
       
        public boolean equals(Object obj) {
               if(this == obj) return true;
               if(obj != null && obj.getClass() == R.class) {
                       R r = (R)obj;
                       return this.count == r.count;
               }
               return false;
        }
        /*
        public int hashCode() {
               return this.count;
        }
*/
        public static void main(String[] args) {
               HashSet hs = new HashSet();
               hs.add(new R(5));
               hs.add(new R(-3));
               hs.add(new R(9));
               hs.add(new R(-2));
               System.out.println(hs.contains(new R(-3)));
               System.out.println(hs);              
        }
}

上面注释了hashCode方法，所以你将会看到下面的结果。

false  
[R[count:9 # hashCode:14927396], R[count:5 # hashCode:24417480], R[count:-2 # hashCode:31817359], R[count:-3 # hashCode:13884241]]

取消注释，则结果就正确了

true  
[R[count:5 # hashCode:5], R[count:9 # hashCode:9], R[count:-3 # hashCode:-3], R[count:-2 # hashCode:-2]]

LinkedHashSet的特征

LinkedHashSet是HashSet的一个子类，LinkedHashSet也根据HashCode的值来决定元素的存储位置，但同时它还用一个链表来维护元素的插入顺序，插入的时候即要计算hashCode又要维护链表，而遍历的时候只需要按链表来访问元素。查看LinkedHashSet的源码发现它是样的，

//LinkedHashSet 源码  
public class LinkedHashSet extends HashSet  
    implements Set, Cloneable, Serializable  
{  
  
    public LinkedHashSet(int i, float f)  
    {  
        super(i, f, true);  
    }  
  
....

在JAVA7+中， LinkedHashSet没有定义任何方法，只有四个构造函数，它的构造函数调用了父类（HashSet)的带三个参数的构造方法，父类的构造函数如下，

public class LinkedHashSet<E>
    extends HashSet<E>
    implements Set<E>, Cloneable, java.io.Serializable {

    private static final long serialVersionUID = -2851667679971038690L;

    /**
     * Constructs a new, empty linked hash set with the specified initial
     * capacity and load factor.
     *
     * @param      initialCapacity the initial capacity of the linked hash set
     * @param      loadFactor      the load factor of the linked hash set
     * @throws     IllegalArgumentException  if the initial capacity is less
     *               than zero, or if the load factor is nonpositive
     */
    public LinkedHashSet(int initialCapacity, float loadFactor) {
        super(initialCapacity, loadFactor, true);
    }

    /**
     * Constructs a new, empty linked hash set with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param   initialCapacity   the initial capacity of the LinkedHashSet
     * @throws  IllegalArgumentException if the initial capacity is less
     *              than zero
     */
    public LinkedHashSet(int initialCapacity) {
        super(initialCapacity, .75f, true);
    }

    /**
     * Constructs a new, empty linked hash set with the default initial
     * capacity (16) and load factor (0.75).
     */
    public LinkedHashSet() {
        super(16, .75f, true);
    }

    /**
     * Constructs a new linked hash set with the same elements as the
     * specified collection.  The linked hash set is created with an initial
     * capacity sufficient to hold the elements in the specified collection
     * and the default load factor (0.75).
     *
     * @param c  the collection whose elements are to be placed into
     *           this set
     * @throws NullPointerException if the specified collection is null
     */
    public LinkedHashSet(Collection<? extends E> c) {
        super(Math.max(2*c.size(), 11), .75f, true);
        addAll(c);
    }

    /**
     * Creates a <em><a href="Spliterator.html#binding">late-binding</a></em>
     * and <em>fail-fast</em> {@code Spliterator} over the elements in this set.
     *
     * <p>The {@code Spliterator} reports {@link Spliterator#SIZED},
     * {@link Spliterator#DISTINCT}, and {@code ORDERED}.  Implementations
     * should document the reporting of additional characteristic values.
     *
     * @implNote
     * The implementation creates a
     * <em><a href="Spliterator.html#binding">late-binding</a></em> spliterator
     * from the set's {@code Iterator}.  The spliterator inherits the
     * <em>fail-fast</em> properties of the set's iterator.
     * The created {@code Spliterator} additionally reports
     * {@link Spliterator#SUBSIZED}.
     *
     * @return a {@code Spliterator} over the elements in this set
     * @since 1.8
     */
    @Override
    public Spliterator<E> spliterator() {
        return Spliterators.spliterator(this, Spliterator.DISTINCT | Spliterator.ORDERED);
    }
}

由此可知，LinkedHashSet本质上也是从LinkedHashMap而来，LinkedHashSet的所有方法都继承自HashSet, 而它能维持元素的插入顺序的性质则继承自LinkedHashMap.

下面是一个LinkedHashSet维持元素插入顺序的例子，

package colection.HashSet;  
  
import java.util.LinkedHashSet;  
  
public class LinkedHashSets {  
    public static void main(String[] args) {  
        LinkedHashSet lhs = new LinkedHashSet();  
        lhs.add("abc");  
        lhs.add("efg");  
        lhs.add("hij");  
        System.out.println(lhs);  
        lhs.remove(new String("efg"));  
        lhs.add("efg");  
        System.out.println(lhs);  
    }  
}

[abc, efg, hij]  
[abc, hij, efg]

TreeSet类的特征

TreeSet实现了SortedSet接口，顾名思义这是一种排序的Set集合，查看jdk源码发现底层是用TreeMap实现的，本质上是一个红黑树原理。正因为它是排序了的，所以相对HashSet来说，TreeSet提供了一些额外的按排序位置访问元素的方法，例如first(), last(), lower(), higher(), subSet(), headSet(), tailSet().

TreeSet的排序分两种类型，一种是自然排序，另一种是定制排序。

自然排序（在元素中写排序规则）

TreeSet 会调用compareTo方法比较元素大小，然后按升序排序。所以自然排序中的元素对象，都必须实现了Comparable接口，否则会跑出异常。对于TreeSet判断元素是否重复的标准，也是调用元素从Comparable接口继承而来额compareTo方法，如果返回0则是重复元素（两个元素I相等）。Java的常见类都已经实现了Comparable接口，下面举例说明没有实现Comparable存入TreeSet时引发异常的情况。

package collection.Set;  
  
import java.util.TreeSet;  
  
class Err {  
      
}  
  
public class TreeSets {  
  
    public static void main(String[] args) {  
        TreeSet ts =  new TreeSet();  
        ts.add(new Err());  
        ts.add(new Err());  
        System.out.println(ts);  
              
    }  
}

运行程序会抛出如下异常

Exception in thread "main" java.lang.ClassCastException: collection.Set.Err cannot be cast to java.lang.Comparable  
    at java.util.TreeMap.compare(Unknown Source)  
    at java.util.TreeMap.put(Unknown Source)  
    at java.util.TreeSet.add(Unknown Source)  
    at collection.Set.TreeSets.main(TreeSets.java:13)

将上面的Err类实现Comparable接口之后程序就能正常运行了

class Err implements Comparable {  
    @Override  
    public int compareTo(Object o) {  
        // TODO Auto-generated method stub  
        return 0;  
    }  
}

还有个重要问题是，因为TreeSet会调用元素的compareTo方法，这就要求所有元素的类型都相同，否则也会发生异常。也就是说，TreeSet只允许存入同一类的元素。例如下面这个例子就会抛出类型转换异常

package collection.Set;  
  
import java.util.TreeSet;  
  
class Err implements Comparable {  
    @Override  
    public int compareTo(Object o) {  
        // TODO Auto-generated method stub  
        return 0;  
    }  
}  
  
public class TreeSets {  
  
    public static void main(String[] args) {  
        TreeSet ts =  new TreeSet();  
        ts.add(1);  
        ts.add("2");  
        System.out.println(ts);  
              
    }  
}

运行结果

Exception in thread "main" java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String  
    at java.lang.String.compareTo(Unknown Source)  
    at java.util.TreeMap.put(Unknown Source)  
    at java.util.TreeSet.add(Unknown Source)  
    at collection.Set.TreeSets.main(TreeSets.java:18)

定制排序（在集合中写排序规则）

TreeSet还有一种排序就是定制排序，定制排序时候，需要关联一个 Comparator对象，由Comparator提供排序逻辑。下面就是一个使用Lambda表达式代替Comparator对象来提供定制排序的例子。下面是一个定制排序的列子

package collection.Set;  
  
import java.util.Comparator;  
import java.util.TreeSet;  
  
class M {  
    int age;  
    public M(int age) {  
        this.age = age;  
    }  
      
    public String toString() {  
        return "M[age:" + age + "]";  
    }  
      
}  
  
class MyCommpare implements Comparator{  
      
    public int compare(Object o1, Object o2){  
        M m1 = (M)o1;  
        M m2 = (M)o2;  
        return m1.age >  m2.age ? 1 : m1.age < m2.age ? -1 : 0;         
    }  
  
}  
  
public class TreeSets {  
  
    public static void main(String[] args) {  
        TreeSet ts =  new TreeSet(new MyCommpare());      
        ts.add(new M(5));  
        ts.add(new M(3));  
        ts.add(new M(9));  
        System.out.println(ts);  
              
    }  
}

当然将Comparator直接写入TreeSet初始化中也可以。如下。

package collection.Set;  
  
import java.util.Comparator;  
import java.util.TreeSet;  
  
class M {  
    int age;  
    public M(int age) {  
        this.age = age;  
    }  
      
    public String toString() {  
        return "M[age:" + age + "]";  
    }  
      
}  
  
public class TreeSets {  
  
    public static void main(String[] args) {  
        TreeSet ts =  new TreeSet(new Comparator() {  
            public int compare(Object o1, Object o2) {  
                M m1 = (M)o1;  
                M m2 = (M)o2;  
                return m1.age >  m2.age ? -1 : m1.age < m2.age ? 1 : 0;     
            }  
        });   
        ts.add(new M(5));  
        ts.add(new M(3));  
        ts.add(new M(9));  
        System.out.println(ts);  
              
    }  
}

EnumSet特征

EnumSet顾名思义就是专为枚举类型设计的集合，因此集合元素必须是枚举类型，否则会抛出异常。 EnumSet集合也是有序的，其顺序就是Enum类内元素定义的顺序。EnumSet存取的速度非常快，批量操作的速度也很快。EnumSet主要提供以下方法，allOf, complementOf, copyOf, noneOf, of, range等。注意到EnumSet并没有提供任何构造函数，要创建一个EnumSet集合对象，只需要调用allOf等方法，下面是一个EnumSet的例子。

package collection.Set;  
  
import java.util.EnumSet;  
  
enum Season  
{  
    SPRING, SUMMER, FALL, WINTER  
}  
public class EnumSets {  
  
    public static void main(String[] args) {  
        //必须用元素对象的类类型来初始化，即Season.class  
        EnumSet es1 = EnumSet.allOf(Season.class);  
        System.out.println(es1);  
        EnumSet es2 = EnumSet.noneOf(Season.class);  
        es2.add(Season.WINTER);  
        es2.add(Season.SUMMER);  
        System.out.println(es2);  
        EnumSet es3 = EnumSet.of(Season.WINTER, Season.SUMMER);  
        System.out.println(es3);  
        EnumSet es4 = EnumSet.range(Season.SUMMER, Season.WINTER);  
        System.out.println(es4);  
        EnumSet es5 = EnumSet.complementOf(es4);  
        System.out.println(es5);  
    }  
}

执行结果

[SPRING, SUMMER, FALL, WINTER]  
[SUMMER, WINTER]  
[SUMMER, WINTER]  
[SUMMER, FALL, WINTER]  
[SPRING]

各种集合性能分析

HashSet和TreeSet是Set集合中用得最多的I集合。HashSet总是比TreeSet集合性能好，因为HashSet不需要额维护元素的顺序。
LinkedHashSet需要用额外的链表维护元素的插入顺序，因此在插入时性能比HashSet低，但在迭代访问（遍历）时性能更高。因为插入的时候即要计算hashCode又要维护链表，而遍历的时候只需要按链表来访问元素。
EnumSet元素是所有Set元素中性能最好的，但是它只能保存Enum类型的元素

转载于:https://my.oschina.net/wugong/blog/1791010