heap中的heapify与依次压入队列的差异

最新推荐文章于 2024-03-15 10:44:56 发布

转载最新推荐文章于 2024-03-15 10:44:56 发布 · 241 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://blog.51cto.com/thuhak/1403491

文章标签：

#python

本文探讨了标准堆插入算法的时间复杂度，并提出了一种更高效的线性时间Heapify算法。该算法通过对数组进行原地调整，在O(n)时间内将数组转换为堆结构。通过对比传统的逐个插入方式，Heapify算法在构建堆的过程中展现了更少的比较次数。

标准的堆插入元素的算法很好理解，而且也很容易知道向堆中插入一个元素的代价是lgn。

按照最常规的想法，把一个数组中所有元素添加到一个堆中，依次压入即可。压入n个元素的代价就是

Sum[Log2[i],{1,i,n}]。

结果等于Log2[n!].根据斯特林公式，n!在n趋向于无穷大时可以近似看成(2Pi*n)^1/2*(n/E)^n。

因此，总代价可以看成n*Log2[n/E]+o(n),比O(n)的阶要大一些。

直觉上，这个操作的复杂度应该是要比这个低的。因为将n个元素完整排序的代价不过是nlgn。而堆应该是一个半排序。不应该接近nlgn。

这个直觉是正确的，heapify操作正是一个线性时间的算法.

举个python的标准库heapq里面的实现

def cmp_lt(x, y):
    # Use __lt__ if available; otherwise, try __le__.
    # In Py3.x, only __lt__ will be called.
    return (x < y) if hasattr(x, '__lt__') else (not y <= x)
def heapify(x):
    """Transform list into a heap, in-place, in O(len(x)) time."""
    n = len(x)
    # Transform bottom-up.  The largest index there's any point to looking at
    # is the largest with a child index in-range, so must have 2*i + 1 < n,
    # or i < (n-1)/2.  If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
    # j-1 is the largest, which is n//2 - 1.  If n is odd = 2*j+1, this is
    # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
    for i in reversed(xrange(n//2)):
        _siftup(x, i)
def _siftdown(heap, startpos, pos):
    newitem = heap[pos]
    # Follow the path to the root, moving parents down until finding a place
    # newitem fits.
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = heap[parentpos]
        if cmp_lt(newitem, parent):
            heap[pos] = parent
            pos = parentpos
            continue
        break
    heap[pos] = newitem
def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # leftmost child position
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1
        if rightpos < endpos and not cmp_lt(heap[childpos], heap[rightpos]):
            childpos = rightpos
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now.  Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)

heapify操作主要有3个过程，heapify对一半的元素调用_siftup，而_siftup又调用了_siftdown。

这个过程作用的原理是什么？

_siftup(heap，pos)最开始部分的代码很明显，就是把pos处的元素沿着最小路径一路向下降到底。途中所比较的子节点依次上浮一层。而_siftdown(heap,startpos,ps)过程则是标准的堆插入动作，将pos处的元素上浮直到小于startpos时停止，也就是startpos的左兄弟节点或上一层节点。

这可以理解为一个从倒数第二层开始构建堆的过程。第一轮循环把最下面两层的3个节点的元素变成堆，再依次向上，通过插入一个上一层元素，将两个堆合并。因为每上一层，这层下面的两个元素分别都是这两棵子树的最小元素。

这个过程可以看成类似数学归纳法的原理。第一个元素成立，而n成立确保n+1成立，因此对所有元素就成立。逐个插入的问题规模是线性增长的，比如比n小的元素是n-1，而归并操作的问题规模是折半的。这就相当于在逐次插入元素的时候，当插到第四个元素的时候，停止向原堆插入，而向新堆插入。当两个堆平衡以后，再将两个堆合并。

因此，他的时间T(n)=2T(n/2)+2lg(n/2)

这刚好落入了主方法的第一种情况，因此他的复杂度是O(n)

举个7个元素的例子。x=[a,b,c,d,e,f,g]。单纯计算比较次数

在最坏情况下，逐个插入算法

1.插入a

2.插入b,a与b比较一次

3.插入c,a鱼c比较一次

4.插入d,d与b比较一次，再与a比较一次

5.插入e，e与b比较一次，再与a比较一次

6.插入f,f与c比较一次,在与a比较一次

7.插入g,g与c比较一次，再与a比较一次

共计比较10次。