2017.11.30
开始对JDK源码的阅读,在算法书上看到提及过这个,而且自己平时偷懒简单排一排的时候,也是使用这个方法进行排序,索性将它看上一看。
/**
* The minimum array length below which a parallel sorting
* algorithm will not further partition the sorting task. Using
* smaller sizes typically results in memory contention across
* tasks that makes parallel speedups unlikely.
*/
private static final int MIN_ARRAY_SORT_GRAN = 1 << 13;
开头第一段就是这个,从字面上分析就是定义了一个常量,大小为8192,注释的意思呢就是一个数组的最小长度如果低于一个并行算法要求的值,那么它对划分排序任务没有什么帮助。使用较小的尺寸通常会导致跨任务的内存争用,这使得并行的速度不太可能得到提升。
Arrays.sort()对基本数据类型进行了重载,对于基本数据类型采用的是 Dual-Pivot Quicksort(即双基准快速排序算法/三向切分)
* Vladimir Yaroslavskiy, Jon Bentley, and Josh Bloch. The algorithm
* offers O(n log(n)) performance on many data sets that cause other
* quicksorts to degrade to quadratic performance, and is typically
* faster than traditional (one-pivot) Quicksort implementations.
*
- 而优先级是当数组长度小于QUICKSORT_THRESHOLD的时候,采用三向划分快排,否则使用归并排序。
- Leftmost的意思是判定是否从数组的第一位,也就是最左边开始排序。否则从中部任意位置开始。笔者试着打断点调试了一下,因为传入的left参数设置为了0的原因,如果leftmost为false,则会抛出异常。这种情况对已经有序的(包括重复主键和升序/降序)数组的效率较高,因为有个if字句,这种情况会直接return
// Inexpensive approximation of length / 7
int seventh = (length >> 3) + (length >> 6) + 1;
/*
* Sort five evenly spaced elements around (and including) the
* center element in the range. These elements will be used for
* pivot selection as described below. The choice for spacing
* these elements was empirically determined to work well on
* a wide variety of inputs.
*/
int e3 = (left + right) >>> 1; // The midpoint
int e2 = e3 - seventh;
int e1 = e2 - seventh;
int e4 = e3 + seventh;
int e5 = e4 + seventh;
接下来定义了一个近似值为数组长度的1/7左右(1/8+1/64+1),以中心点(起点+终点)/2,间距为前面定义的七分之一,均匀的分成了五份。// Sort these elements using insertion sort
if (a[e2] < a[e1]) { int t = a[e2]; a[e2] = a[e1]; a[e1] = t; }
if (a[e3] < a[e2]) { int t = a[e3]; a[e3] = a[e2]; a[e2] = t;
if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }
}
if (a[e4] < a[e3]) { int t = a[e4]; a[e4] = a[e3]; a[e3] = t;
if (t < a[e2]) { a[e3] = a[e2]; a[e2] = t;
if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }
}
}
if (a[e5] < a[e4]) { int t = a[e5]; a[e5] = a[e4]; a[e4] = t;
if (t < a[e3]) { a[e4] = a[e3]; a[e3] = t;
if (t < a[e2]) { a[e3] = a[e2]; a[e2] = t;
if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }
}
}
}
从这里可以看到,结合前面的注释,我们知道,数组会对这几个位置上的元素采用插入排序(如:e2<e1则将两个元素交换位置等等),并且本插入排序是改良后的,并不是单纯的插入,而是尽量的将较大的元素挪到后面,这样减少了移动元素所花费的时间,因为整个数组是比较大的。(>=insertion,采用双基准快排+插入排序,在快排之前采用了插排)
// Pointers
int less = left; // The index of the first element of center part
int great = right; // The index before the first element of right part
在这定义了两个标志位,因为是双基准,所以整个数组被切分成了四份(<1切分元素,介于1,2之间的,暂时未排定的,>2切分元素) 这里定义的两个标志实际上就是中间部分的起点和终点。
/*
* If center part is too large (comprises > 4/7 of the array),
* swap internal pivot values to ends.
*/
如果中间部分即(pivot1<&&<pivot2)大于整个数组的4/7,那么将中间的基准与最后的值交换,意在缩小中间区域的大小。接下来就是常规的两个while循环进行快排标志量的变换。
/*
* Partitioning:
*
* left part center part right part
* +----------------------------------------------------------+
* | == pivot1 | pivot1 < && < pivot2 | ? | == pivot2 |
* +----------------------------------------------------------+
* ^ ^ ^
* | | |
* less k great
*
* Invariants:
*
* all in (*, less) == pivot1
* pivot1 < all in [less, k) < pivot2
* all in (great, *) == pivot2
*
* Pointer k is the first index of ?-part.
*/
等于两个切分元素的区域以及介于两个元素之间+待排元素的区域。 if (a[great] == pivot1)
{ // a[great] < pivot2
a[k] = a[less];
/*
* Even though a[great] equals to pivot1, the
* assignment a[less] = pivot1 may be incorrect,
* if a[great] and pivot1 are floating-point zeros
* of different signs. Therefore in float and
* double sorting methods we have to use more
* accurate assignment a[less] = a[great].
*/
a[less] = pivot1;
++less;
如果great==pivot1,我们知道,这个其实就是已经将待排序区域完成排序了,这一趟的排序已经就接近了尾声。