关于java集合类的遍历器Iterator及其ConcurrentModificationException的细节

最新推荐文章于 2025-01-11 10:55:33 发布

原创最新推荐文章于 2025-01-11 10:55:33 发布 · 335 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#Iterator #ConcurrentModificationException #HashMap #LinkedList

Java 专栏收录该内容

1 篇文章

订阅专栏

本文深入探讨Java集合框架中List、Set和Map接口的遍历机制，详细解析LinkedList和HashMap的遍历器设计，包括遍历器的内部实现、遍历过程中的链表修改检测与异常处理。

Java的集合框架中的三种不同的接口List, Set和Map的实现类都是可遍历的.

List和Set的元素是单值, 这两个接口直接继承于Collection接口,而Collection接口继承于Iterable接口.Map接口中的元素是个key/value的节点,虽然Map接口不继承Collection接口,但是Map接口中的key和value被设计成分别可遍历的.对于可遍历类,其优点就是可以使用for-each循环来访问其元素,并且Iterator接口规范使类的遍历操作保持行为一致.需要注意的是,Set接口的具体类,其实现一般都由对应的Map接口类来执行,比如HashSet由HashMap来实现, TreeSet由TreeMap来实现,其元素对应Map接口类中的key,不关注value的值.所以Set接口类的遍历操作其实就是Map接口类的key的遍历操作. 因为List接口和Map接口均有不同的实现类,例如List接口有ArrayList实现类和LinkedList实现类, Map接口有HashMap实现类和TreeMap实现类,下面以LinkedList和HashMap讨论其遍历器的设计.

LinkedList的遍历器
LinkedList类和ArrayList类都实现List接口, ArrayList类是用数组模拟的链表实现, 而LinkedList类是一个真正的意义上的链表.其主要特征有:

是个双向链表.因为该类内部维护了指向双向链表头和尾部的两个成员变量,因此既可以从头开始遍历,也可以从尾部开始遍历,并且在链表的中间节点,既可以向前访问,也可以向后访问.
允许null元素.
非synchronized的.

需要注意的是java新的集合框架中的类都是设计为没有同步锁的,如果在多线程下访问,需要使用Collections.synchronizedXXX封装.

LinkedList类的遍历器实现了接口ListIterator, 该接口继承了基本的Iterator接口.接口ListIterator根据双向链表的属性对基本接口功能进行了扩展.ListIterator接口规范为(ListIterator.java):

public interface ListIterator<E> extends Iterator<E> {
    // Query Operations
    boolean hasNext();
    E next();
    boolean hasPrevious();
    E previous();
    int nextIndex();
    int previousIndex();

    // Modification Operations
    void remove();
    void set(E e);
    void add(E e);
}

在遍历器调用next或者previous返回元素以后, 如果进行修改操作,这里需要注意的是:

remove操作和add操作是互斥的,二者只能执行其一. 并且他们自身也是互斥的,即只能执行一次remove或者add操作.
在remove操作或者add操作后,不能再进行set操作.
在进行set操作后,可以进行remove操作和add操作.
set操作自身可以执行多次(尽管可能实际场景中没有太大意义).

下面通过代码列子说明LinkedList类的遍历器的使用和实现机制.

public class LinkedListTest {
   final List<String> mList = new LinkedList<>();
   final CountDownLatch mLatch = new CountDownLatch(1);
   
   public LinkedListTest() {
       init();
   }
   
   private void init() {
       mList.add("one");
       mList.add("two");
       mList.add("three");
       mList.add("four");
   }
   
   // Traverse
   public void iteratorTest1() {
       //final ListIterator<String> iterator = (ListIterator<String>) mList.iterator();
       final ListIterator<String> iterator = mList.listIterator();
       while (iterator.hasNext()) {
           final String s = iterator.next();
           System.out.println(s);
   	}
   }
   
   // Traverse reverse
   public void iteratorTest2() {        
       final ListIterator<String> iterator = mList.listIterator(mList.size());
       while (iterator.hasPrevious()) {
           final String s = iterator.previous();
           System.out.println("Current:" + s);
       }
   }
   
   // Traverse from the given position.
   public void iteratorTest3(int index) {
       if (index > mList.size()) {
           index = mList.size();
       }
       if (index < 0) {
           index = 0;
       }
       
       final ListIterator<String> iterator = mList.listIterator(index);
       while (iterator.hasNext()) {
           final String s = iterator.next();
           System.out.println("Current:" + s);
       }
   }
   	
   // Remove
   public void iteratorTest4(String removed) {
       final ListIterator<String> iterator = mList.listIterator();
       while (iterator.hasNext()) {
           final String s = iterator.next();
           System.out.println("Current:" + s);
           if (s.equals(removed)) {
               iterator.remove();
           }            
       }
   }
   
   // Add
   public void iteratorTest5(String before, String added) {
       final ListIterator<String> iterator = mList.listIterator();
       while (iterator.hasNext()) {
           final String s = iterator.next();
           System.out.println("Current:" + s);
           if (s.equals(before)) {
               iterator.add(added);                
           }
       }	    
   }
   
   // Set
   public void iteratorTest6(String from, String to) {
       final ListIterator<String> iterator = mList.listIterator();
       while (iterator.hasNext()) {
           final String s = iterator.next();
           System.out.println("Current:" + s);
           if (s.equals(from)) {
               iterator.set(to);                
           }
       }   	    
   }
   
   // Error case
   public void iteratorTest7(String removed) {
       final ListIterator<String> iterator = mList.listIterator();
       while (iterator.hasNext()) {
           final String s = iterator.next();
           System.out.println("Current:" + s);
           if (s.equals(removed)) {
               mList.remove(removed);
           } 
       } 	    
   }
  
   // Error case. multi-thread access.
   public void iteratorTest8() {
       final ListIterator<String> iterator = mList.listIterator();
       while (iterator.hasNext()) {
           final String s = iterator.next();
           System.out.println("current:" + s + " thread:" + Thread.currentThread());
           iterator.remove();
           
           if (Thread.currentThread().getName().equals("thread#1")) {
               try {
                   mLatch.await();                  
               } catch (InterruptedException e) {                  
               }                 
           } else {
               mLatch.countDown();                
           }
       }	    
   }
   
   private final Runnable runnable = () -> {
       iteratorTest8();
   };

   public void multiThreadAccess() {
       final Thread t = new Thread(runnable, "thread#1");
       t.start();
       try {
           Thread.sleep(200);
       } catch (InterruptedException e) {          
       }        
       iteratorTest8();
   }
   
   public int size() {
       return mList.size();
   }
   
   public void dump() {	    
       final StringBuilder builder = new StringBuilder();
       for (String s : mList) {
           builder.append(" ");
           builder.append(s);	        	        
       }
       builder.append(" ");
       System.out.println("[" + builder.toString() + "]");	    
   }
   
   public static void main(String argv[]) {
       final LinkedListTest l = new LinkedListTest();	    
       if (argv.length > 0) {
           final String s = argv[0];
           switch (s) {
           case "1":
               l.iteratorTest1();
               break;	            
           case "2":
               l.iteratorTest2();
               break;                
           case "3":
               l.iteratorTest3(Integer.valueOf(argv[1]));
               break;                
           case "4":
               l.iteratorTest4(argv[1]);
               break;                
           case "5":
               l.iteratorTest5(argv[1], argv[2]);
               break;                
           case "6":
               l.iteratorTest6(argv[1], argv[2]);
               break; 
           case "7":
               l.iteratorTest7(argv[1]);
               break;
           case "8":
               l.multiThreadAccess();
               break;	        
           }
           l.dump();
       }
   }
}

函数iteratorTest1是正常遍历操作,对于LinkedList类可以调用函数listIterator来获得接口ListIterator的遍历器对象,也可以调用函数iterator获得接口Iterator对象, 然后再类型转换为ListIterator接口对象.我们使用ListIterator对象,可以使用基本接口Iterator没有的API, 当然使用基本接口Iterator来遍历LinkedList也可以.如果在遍历过程中,只是获得链表中的节点,没有修改操作,并且是从头到尾部的方向进行遍历, 也可以直接使用for-each循环进行访问.

函数iteratorTest2从链表尾部向头部进行遍历.调用函数listIterator的时候,需要传入链表节点个数,并且使用遍历器的hasPrevious和previous进行遍历.

函数iteratorTest3从指定的位置向后遍历,当然也可以从指定的位置向前遍历.

函数iteratorTest4在遍历的时候,处理完当前节点以后,可以调用遍历器的remove接口将当前节点删除.

函数iteratorTest5调用遍历器的add接口在指定的链表节点后添加一个节点到链表中.

函数iteratorTest6调用遍历器的set接口将指定的链表节点值替换为新的值.

函数iteratorTest7在遍历过程中,通过链表接口将当前节点删除,而不是使用遍历器接口删除.这是完全不允许的,这种操作将导致抛出ConcurrentModificationException异常.当删除最后一个节点(例子代码中的four值),不会抛出该异常,因为遍历器已经遍历完了链表节点.当删除最后第二个节点(例子代码中的three值),也不会抛出异常, 但是遍历器将不会遍历到最后一个节点,这很容易产生bug, 并且其运行行为依赖于遍历器内部实现.结论是:

在遍历器执行过程中,严禁使用链表的接口修改链表,只能使用遍历器的接口修改链表.

函数multiThreadAccess测试了在多线程场景下,使用遍历器接口修改链表节点的情况.我们知道使用遍历器的remove和add接口在单线程环境中修改链表是不会抛出ConcurrentModificationException异常的.但是如果在多线程环境中则会抛出ConcurrentModificationException异常.

下面是LinkedList的遍历器内部实现细节.
当链表对象调用listIterator函数时, 其执行为(AbstractList.java):

    public ListIterator<E> listIterator() {
        return listIterator(0);
    }

函数listIterator的实现为(LinkedList.java):

    public ListIterator<E> listIterator(int index) {
        checkPositionIndex(index);
        return new ListItr(index);
    }

即创建了遍历器类对象.遍历器类ListItr的定义为(LinkedList.java):

    private class ListItr implements ListIterator<E> {
        private Node<E> lastReturned;
        private Node<E> next;
        private int nextIndex;
        private int expectedModCount = modCount;

        ListItr(int index) {
            // assert isPositionIndex(index);
            next = (index == size) ? null : node(index);
            nextIndex = index;
        }

        public boolean hasNext() {
            return nextIndex < size;
        }

        public E next() {
            checkForComodification();
            if (!hasNext())
                throw new NoSuchElementException();

            lastReturned = next;
            next = next.next;
            nextIndex++;
            return lastReturned.item;
        }

        public boolean hasPrevious() {
            return nextIndex > 0;
        }

        public E previous() {
            checkForComodification();
            if (!hasPrevious())
                throw new NoSuchElementException();

            lastReturned = next = (next == null) ? last : next.prev;
            nextIndex--;
            return lastReturned.item;
        }

        public int nextIndex() {
            return nextIndex;
        }

        public int previousIndex() {
            return nextIndex - 1;
        }

        public void remove() {
            checkForComodification();
            if (lastReturned == null)
                throw new IllegalStateException();

            Node<E> lastNext = lastReturned.next;
            unlink(lastReturned);
            if (next == lastReturned)
                next = lastNext;
            else
                nextIndex--;
            lastReturned = null;
            expectedModCount++;
        }

        public void set(E e) {
            if (lastReturned == null)
                throw new IllegalStateException();
            checkForComodification();
            lastReturned.item = e;
        }

        public void add(E e) {
            checkForComodification();
            lastReturned = null;
            if (next == null)
                linkLast(e);
            else
                linkBefore(e, next);
            nextIndex++;
            expectedModCount++;
        }

        public void forEachRemaining(Consumer<? super E> action) {
            Objects.requireNonNull(action);
            while (modCount == expectedModCount && nextIndex < size) {
                action.accept(next.item);
                lastReturned = next;
                next = next.next;
                nextIndex++;
            }
            checkForComodification();
        }

        final void checkForComodification() {
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
        }
    }

遍历器构造器初始化时, 会对其next成员变量,nextIndex成员变量和expectedModCount成员变量赋值.next成员变量和返回的节点相关.nextIndex成员变量和遍历器是否遍历完链表相关.expectedModCount成员变量用于遍历过程中的fast-fail检测,如果失败则抛出ConcurrentModificationException异常,其初始值为modCount, 该字段为LinkedList类的成员变量,含义为链表的修改计数,无论链表是增加节点还是删除节点,成员变量modCount的值都会增加.

LinkedList类的成员变量size是链表节点个数. Node类实现了链表中节点的逻辑结构, 其定义为(LinkedList.java):

    private static class Node<E> {
        E item;
        Node<E> next;
        Node<E> prev;

        Node(Node<E> prev, E element, Node<E> next) {
            this.item = element;
            this.next = next;
            this.prev = prev;
        }
    }

即Node类存储链表节点的值,并管理其前向指针和后向指针, 从而将LinkedList中的节点设计为双向链表结构.

遍历器类的hasNext函数判断遍历操作(正向遍历)是否结束,如果遍历器的当前节点索引小于链表节点个数size,则继续遍历.

遍历器函数hasPrevious判断遍历操作(反向遍历)是否结束.如果遍历器的当前节点索引大于0, 说明当前节点存在前向节点,可以继续遍历.

遍历器类的next函数返回链表当前节点.该函数中将更新next指针,指向当前节点的下一个节点, 增加遍历器索引nextIndex的值, 返回返回当前节点的值.

遍历器函数set将更新当前节点值,因为只是值替换,因此不会更新遍历器的next成员变量和nextIndex成员变量.

遍历器函数remove从链表中删除当前节点.首先调用函数unlink将当前节点从链表中删除,更新遍历器索引nextIndex, 因为当前节点从链表中已经删除,为了正确的遍历到下一个节点,遍历器索引值要nextIndex--,最后更新遍历器成员变量expectedModCount的值.

函数unlink的实现为(LinkedList.java):

    E unlink(Node<E> x) {
        // assert x != null;
        final E element = x.item;
        final Node<E> next = x.next;
        final Node<E> prev = x.prev;

        if (prev == null) {
            first = next;
        } else {
            prev.next = next;
            x.prev = null;
        }

        if (next == null) {
            last = prev;
        } else {
            next.prev = prev;
            x.next = null;
        }

        x.item = null;
        size--;
        modCount++;
        return element;
    }

该函数的逻辑包括两部分.首先是双向链表节点的删除操作,然后更新链表节点个数size, 更新链表修改计数modCount.

遍历器函数add在当前节点后增加一个节点到双向链表中.如果链表已经遍历结束,即next指针为null, 将新节点添加到双向链表尾部,如果没有遍历结束,将节点插入到next指针指向的节点前面.

总结:

从遍历器的remove和add函数可以看出,使用遍历器接口修改链表之所以不会抛出ConcurrentModificationException异常是因为同步更新了链表成员变量modCount和遍历器成员变量expectedModCount.如果在遍历过程中,调用链表的接口修改,只会更新链表的成员变量modCount,所以会抛出ConcurrentModificationException异常.
遍历器返回链表中最后第二个节点后, 遍历器成员变量nextIndex的值为size-1, 如果此时使用链表接口删除了该节点,则链表节点个数size--, 此时nextIndex与size相等,不再满足hasNext的遍历条件, 将终止遍历,所以这种情况下尽管没有抛出ConcurrentModificationException异常,却没有完成链表遍历,极易产生bug.
在遍历器进行过程中,如果使用链表接口向链表中添加节点元素,无论此时遍历器索引nextIndex在什么位置,都会抛出ConcurrentModificationException异常.因为链表接口添加节点元素会导致size++,和modCount++,而遍历器成员变量nextIndex和expectedModCount却没有更新.
所以在遍历过程中,禁止使用链表接口直接删除节点元素和添加节点元素.

HashMap的遍历器
Map接口类中的key和value可以分别进行遍历,Set接口的实现类就是借助Map的key遍历实现的.HashMap的key的管理类为KeySet, value的管理类为Values, 其继承关系分别为:

final class KeySet extends AbstractSet<K>

final class Values extends AbstractCollection<V>

AbstractSet和AbstractCollection都实现了Iterable接口.

类KeySet的函数iterator实现(HashMap.java):

public final Iterator<K> iterator()     { return new KeyIterator(); }

key的遍历器类KeyIterator的定义为(HashMap.java):

    final class KeyIterator extends HashIterator
        implements Iterator<K> {
        public final K next() { return nextNode().key; }
    }

该类继承了遍历器类HashIterator并覆写了next方法.遍历器类HashIterator的定义为(HashMap.java):

    abstract class HashIterator {
        Node<K,V> next;        // next entry to return
        Node<K,V> current;     // current entry
        int expectedModCount;  // for fast-fail
        int index;             // current slot

        HashIterator() {
            expectedModCount = modCount;
            Node<K,V>[] t = table;
            current = next = null;
            index = 0;
            if (t != null && size > 0) { // advance to first entry
                do {} while (index < t.length && (next = t[index++]) == null);
            }
        }

        public final boolean hasNext() {
            return next != null;
        }

        final Node<K,V> nextNode() {
            Node<K,V>[] t;
            Node<K,V> e = next;
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            if (e == null)
                throw new NoSuchElementException();
            if ((next = (current = e).next) == null && (t = table) != null) {
                do {} while (index < t.length && (next = t[index++]) == null);
            }
            return e;
        }

        public final void remove() {
            Node<K,V> p = current;
            if (p == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            current = null;
            K key = p.key;
            removeNode(hash(key), key, null, false, false);
            expectedModCount = modCount;
        }
    }

遍历器类在构造函数中初始时, 将成员变量expectedModCount的值设置为HashMap的成员变量modCount.字段table是HashMap的成员变量, 是哈希结构的桶bucket,每个bucket对应一个冲突的单向链表(也可能是红黑树). HashMap类的成员变量size,为节点个数.遍历器成员变量index,用来找到非空的bucket.遍历器成员变量next是下一个将要返回的节点.

遍历器函数hasNext判断是否遍历结束, 如果next指针非null,代表还存在节点,需要继续遍历.

遍历器函数nextNode返回节点.该函数返回的Node类对象, 其结构既包含key也包含value, 如果是key遍历, 则KeySet的next函数执行nextNode().key,返回key即可.函数nextNode中, 同时更新节点指针next指向该节点的下一个节点.如果该bucket遍历完了, next为null, 则执行下面的逻辑,查找下一个非空的bucket:

if ((next = (current = e).next) == null && (t = table) != null) {
    do {} while (index < t.length && (next = t[index++]) == null);
}

遍历器函数remove将当前返回的节点从HashMap中删除,函数removeNode进行节点删除.和LinkedList一样, 遍历器类的remove接口修改底层数据结构之所以是安全的,是因为同步更新了modCount和expectedModCount,这样不会抛出ConcurrentModificationException异常.

总结:

遍历器类HashIterator在判断是否遍历结束的函数hasNxet中,使用next指针是因为HashMap是个稀疏数据结构,可能有的bucket是空的.

设计可遍历的类
之前讨论到,将类设计成可以遍历的话,可以使用for-each循环获得元素,也可以使用Iterator接口API进行遍历.

下面是一个自定义的类, 被设计成可以遍历.

    public class Generator implements Iterable<Integer> {
    	private final int size;
    	private final Random r = new Random();
    	
    	public Generator() {
    		this(10);    		
    	}
    	
    	public Generator(int size) {
    		this.size = size;
    	}
    	
    	public Iterator<Integer> iterator() {
    		return new Itr();
    	}
    	
    	private Integer getElement() {
    		return r.nextInt();    		
    	}
    	
    	private class Itr implements Iterator<Integer> {
    		private int index;  		
    		    		
    		public boolean hasNext() {
    			return index < size;
    		}
    		
    		public Integer next() {
    			index++;
    			return getElement();
    		}    		
    	}    	
    }

设计一个可遍历的类的关键点包括:

类实现Iterable接口,并实现该接口的函数iterator,该函数中返回实现的遍历器对象.
遍历器类实现接口Iterator,并实现该接口的hasNext和next函数.

这里面需要注意的是:

遍历器类要实现为内部类且非静态的. 这是因为遍历器类通常要引用其外部类的成员变量.
根据类的属性及使用需求来决定其遍历器类是否实现更高级的API, 例如像LinkedList类那样实现ListIterator接口.如果没有特别的需要,遍历器类实现基本的Iterator接口即可.
是否进行了检测遍历过程中的链表修改,并抛出ConcurrentModificationException异常.应该在类的文档注释里面说明.