数量级
接前面
习题非递归快速排序讲了数据结构在排序算法中的重要作用。排序对象是随机自然数,从一万开始。
小结
书上对常见的排序算法做了小结。
一万: O ( n 2 ) O(n^2) O(n2)
里面最慢的是冒泡排序,八秒多,因为它要一个一个地比。
list_sort 0.002000093460083008
heapsort 0.013000011444091797
bubble减少交换 3.235999584197998
bub 8.67300009727478
insert 4.486999988555908
shellSort 0.04900026321411133
quick 0.05900001525878906
quick_partition 0.06200003623962402
MedianSortBFPRT 0.12800002098083496
merge 0.04200005531311035
五万: n ∗ log 2 n n*\log_{2}{n} n∗log2n
这里插入冒泡都到一两百,要淘汰了。
淘汰:
select 136.42199969291687
bubble减少交换 81.0329999923706
bub 214.32599997520447
insert 113.50599980354309
list_sort 0.0070002079010009766
heapSort done 0.372999906539917
shellSort 0.3559999465942383
quick 0.18700003623962402
quick_partition 0.1510000228881836
MedianSortBFPRT 0.6370000839233398
merge 0.25099992752075195
pivotsort 0.22099995613098145
五十万:中值排序
这里最慢的是MedianSortBFPRT中值排序,八秒多。先找到数组的中值,把数分两边;再递归两边的数据,直到排好序。
list_sort 0.10700011253356934
heapsort 0.7980000972747803
heapSort done 5.137999773025513
shellSort 6.007999897003174
quick 1.9620001316070557
quick_partition 2.17900013923645
MedianSortBFPRT 8.876000165939331
merge 3.1600000858306885
pivotsort 2.5890002250671387
代码
MedianSortBFPRT 是中值排序的改进版,从四个值里选中值。从C语言代码改过来的。应该比较适合指针操作的,说到指针还有表插入排序。
def cmp(a, b):
if a < b:
return -1
elif a > b:
return 1
else:
return 0
def medianOfFour(ar, left, gap):
pos1, pos2, pos3, pos4 = left, left+gap, left+2*gap, left+3*gap
a1, a2, a3, a4 = ar[pos1], ar[pos2], ar[pos3], ar[pos4]
if cmp(a1, a2) <= 0:
if cmp(a2, a3) <= 0:
if cmp(a2, a4) <= 0:
if cmp(a3, a4) > 0:
ar[pos3], ar[pos4] = ar[pos4], ar[pos3]
else:
ar[pos2], ar[pos3] = ar[pos3], ar[pos2]
else:
if cmp(a1, a3) <= 0:
if cmp(a3, a4) <= 0:
if cmp(a2, a4) <= 0:
ar[pos2], ar[pos3] = ar[pos3], ar[pos2]
else:
ar[pos3], ar[pos4] = ar[pos4], ar[pos3]
else:
if cmp(a1, a4) <= 0:
if cmp(a2, a4) <= 0:
ar[pos2], ar[pos3] = ar[pos3], ar[pos2]
else:
ar[pos3], ar[pos4] = ar[pos4], ar[pos3]
else:
ar[pos1], ar[pos3] = ar[pos3], ar[pos1]
else:
if cmp(a1, a3) <=0:
if cmp(a1, a4) <=0:
if cmp(a3, a4) > 0:
ar[pos3], ar[pos4] = ar[pos4], ar[pos3]
else:
ar[pos3], ar[pos1] = ar[pos1], ar[pos3]
else:
if cmp(a2, a3) <=0:
if cmp(a3, a4) <=0:
if cmp(a1, a4) <= 0:
ar[pos1], ar[pos3] = ar[pos3], ar[pos1]
else:
ar[pos3], ar[pos4] = ar[pos4], ar[pos3]
else:
if cmp(a2, a4) <=0:
if cmp(a1, a4) <= 0:
ar[pos1], ar[pos3] = ar[pos3], ar[pos1]
else:
ar[pos3], ar[pos4] = ar[pos4], ar[pos3]
else:
ar[pos2], ar[pos3] = ar[pos3], ar[pos2]
def insertion(ar, low, right, gap):
for loc in range(low+gap, right, gap):
i = loc - gap
value = ar[loc]
while i>=low and cmp(ar[i],value)>0:
ar[i+gap] = ar[i]
i -= gap
ar[i+gap] = value
def medianOfMedians(ar, left, right, gap):
'''gap是1也会是最后调用的span'''
span = 4*gap
num = (right-left+1)//span
if (0==num):
insertion(ar, left, right, gap)
num = (right-left+1)//gap
return int(left + gap*(num-1)//2)
for s in range(left, right, span):
if s + span <= right:
medianOfFour(ar, s, gap)
if (4>num):
insertion(ar, left+span//2, right, span)
return int(left + num*span//2)
else:
return medianOfMedians(ar, left+span//2, s-1, span)
def partition(alist, left, right, pivotIndex):
'''返回第几大的前区'''
pivot = alist[pivotIndex]
store = left
alist[right], alist[pivotIndex] = alist[pivotIndex], alist[right]
for i in range(left, right):
if alist[i] <= pivot:
alist[i] ,alist[store] = alist[store], alist[i]
store += 1
alist[right], alist[store] = alist[store], alist[right]
return store
def selectMedian(ar, left, right):
k =(right - left + 1)//2
while k > 0:
idx = medianOfMedians(ar, left, right, 1)
pivotIndex = partition(ar, left, right, idx)
p = left + k
if p == pivotIndex:
return pivotIndex
elif p < pivotIndex:
right = pivotIndex - 1
else:
k = k - (pivotIndex - left + 1)
left = pivotIndex + 1
return left
def medianSort(alist, left, right):
if (left >= right):return
mid = (right - left + 1) // 2
me = selectMedian(alist, left, right)
medianSort(alist, left, left + mid -1)
medianSort(alist, left + mid + 1, right)
五百万: s h e l l shell shell排序 O ( n d ) ( 1 < d < 1.5 ) O(n^d)(1<d<1.5) O(nd)(1<d<1.5)
MedianSortBFPRT已经137秒,可以淘汰了。
s
h
e
l
l
shell
shell排序98秒下一级淘汰。
淘汰:
MedianSortBFPRT 137.97499990463257
list_sort 1.5319998264312744
shellSort 98.6560001373291
merge 39.10199999809265
pivotsort 30.817999839782715
quick_medianOf3 26.94600009918213
heapSort done 66.4060001373291
quick 25.361000299453735
quick非递归 25.13599991798401
quick_partition 26.33399987220764
quick_median_pivot_2cursor 26.187999963760376
quick_end_pivot 24.217000007629395
quick_end 23.700000047683716
radixSort 24.115000009536743
一千万:快速排序
pivotsort,quick_partition都是快速排序,使用轴交换数据。 像前面的quick_medianOf3 是用前中后三个值大小在中间的那个做轴,quick_median_pivot_2cursor 正中间的做轴,quick_end_pivot是以最后一个数做轴,quick_end在数量少到10的时候用插入排序。
if low - 1 - start <= 10:
insertionSort(alist[start:low])
else:
quick_end(alist, start, low - 1)
if end - 1 - low <= 10:
insertionSort(alist[low + 1:end + 1])
else:
quick_end(alist, low + 1, end)
淘汰:
shellSort 276.1419999599457
list_sort 3.2100000381469727
heapSort done 142.9500002861023
quick 90.61400008201599
quick_partition 146.29100012779236
merge 81.66299986839294
pivotsort 111.06699991226196
快速排序和归并排序merge 也要淘汰了。
五千万:基数排序 O ( d ( n + r ) ) O(d(n+r)) O(d(n+r))
看数据下一个淘汰的是基数排序radixSort和堆排序heapSort。堆没得说像二叉树一样。但是基数尽然比堆还快了600多秒,这里就要看题目了,排序的是自然数,要是带小数的就不能用基数排序了。
淘汰:
quick 1389.9990000724792
pivotsort 1236.2779998779297
list_sort 17.25499963760376
radixSort 170.40100002288818
heapSort done 848.5410001277924
一亿:点数排序 O ( n ) O(n) O(n)
其它的都打印不出来了,python列表排序sort()也要34秒了。
list_sort 34.79900026321411
代码
里面是有一亿重复的数据(0到9)。
本想用x=random.sample(range(1000),10000)生成重复的数据,报错了:
ValueError Sample larger than population or is negative
只好用这种方式:模运算还是很好用的。
x = [i % 10 for i in range(100000000)]
random.shuffle(x)
x = [i % 10 for i in range(100000000)]
random.shuffle(x)
from collections import defaultdict
def countingSort(A):
B = defaultdict(lambda : 0) #buckets
for i in A:
B[i] += 1
idx = 0
for i in range(len(B)):
while B[i] > 0:
A[idx] = i
idx += 1
B[i] -= 1
#x = random_int_list(0,10,50)
#x=random.sample(range(1000),10000)
dianshux = x.copy()
start = time.time()
countingSort(dianshux)
end = time.time()
print("点数排序",end - start)
pythenlistx = x.copy()
start = time.time()
pythenlistx.sort()
end = time.time()
print("list_sort",end - start)
打印
先看一下这个打印:
点数排序 28.121000051498413
list_sort 7.856000185012817
这也是一亿随机数据啊,怎么list_sort从上面的34秒降到了7秒。看来打败list_sort没有那么容易。
十亿:散列排序
怎么把散的数据列起来?
点数排序 294.4590001106262
list_sort 134.59100008010864
没有打败。
桶排序(hash)
不能用
def hash(a):
return a // 100 #改这里应该好点吧,还是要加大桶
这样的散列函数(hash)把你排在比较低的位置。
匹配——散列法讲到了散列法。散列函数有数字分析法、折叠法、中平方法、基数转换法、除法…还有很多。匹配——rabin_karp是怎么滚动的?讲到的取模运算散列方法把我折磨得不轻。只要充分分析好数据,运用更多的数学知识,一定能打败list_sort。
更新
1,希尔排序的平均时间复杂度
O
(
n
d
)
(
1
<
d
<
1.5
)
O(n^d)(1<d<1.5)
O(nd)(1<d<1.5)
2,基数排序的平均时间复杂度
O
(
d
(
n
+
r
)
)
O(d(n+r))
O(d(n+r))
其中r为所采取的基数,n为桶数,d为关键码的位数。
2,点数排序的平均时间复杂度
O
(
n
)
O(n)
O(n)