算法复杂度-渐进分析（Asymptotic Analysis）

最新推荐文章于 2024-09-22 12:40:07 发布

Dld_ML_Blog

最新推荐文章于 2024-09-22 12:40:07 发布

阅读量2.2k

点赞数 1

CC 4.0 BY-SA版权

分类专栏：笔记

本文链接：https://blog.youkuaiyun.com/weixin_39257042/article/details/105781632

笔记专栏收录该内容

15 篇文章

订阅专栏

本文探讨了渐进分析的基本概念，通过对比两种不同算法的复杂度，介绍了Big-Theta标记法及其在评估代码性能中的应用。文章详细分析了算法的运行时间和调用次数，展示了如何使用Python进行算法效率的实测，并引入了Big-Theta标记法来描述算法的复杂度。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

渐进分析 Note

1. Introdiction

渐进分析 （Asymptotic Analysis）主要用于评估代码的性能
Old saying:
An engineer will do for a dim what any fool will do for a dollar.

垃圾的代码：使用不合适的数据结构、复杂、缓慢、占用大量内存
好的代码：合适的数据结构、简洁、高效、使用合理的内存开销

2. Big-Theta Notation - $\Theta$

2.1 Example-Duplicate Detection

考虑分析在如下的列表中，查找出重复元素的两种方法的算法复杂度：

-3	-1	2	4	4	8	12

implimentation of comparing algorithm
Object: 查找array中是否存在重复的元素
Determine if their is any duplicates in the array。
一个基本的想法是考虑所有可能的情况
A silly method

def sillySearch(x):
	OperationCount = 0
	print('I compare every possible pair!')
	lengthOfX = len(x)
	for _ in range(lengthOfX):
		currentA = x[_]
		for i in range(_+1,lengthOfX):
			OperationCount +=1
			print("Comparing ",str(x[_])+"=="+str(x[i]),"Operation Count: ",str(OperationCount))
			if x[_]==x[i]:
				print(str(x[_])+"="+str(x[i]))
				return True

程序输出：

I compare every possible pair!
Comparing  -3==-1 Operation Count:  1
Comparing  -3==2 Operation Count:  2
Comparing  -3==4 Operation Count:  3
Comparing  -3==4 Operation Count:  4
Comparing  -3==8 Operation Count:  5
Comparing  -3==12 Operation Count:  6
Comparing  -1==2 Operation Count:  7
Comparing  -1==4 Operation Count:  8
Comparing  -1==4 Operation Count:  9
Comparing  -1==8 Operation Count:  10
Comparing  -1==12 Operation Count:  11
Comparing  2==4 Operation Count:  12
Comparing  2==4 Operation Count:  13
Comparing  2==8 Operation Count:  14
Comparing  2==12 Operation Count:  15
Comparing  4==4 Operation Count:  16
4=4
0:00:00.000192
[Finished in 1.9s]

一个更好的方法是只考虑相邻的情况

# a little bit cleverer method
# only consider the neighboring duplication
def betterSearch(x):
	OperationCount = 0
	print('I compare neighboring pairs!')
	lengthOfX = len(x)
	for _ in range(lengthOfX):
		currentA = x[_]
		OperationCount +=1
		print("Comparing ",str(x[_])+"=="+str(x[_+1]),"Operation Count: ",str(OperationCount))
		if x[_]==x[_+1]:
			print(str(x[_])+"="+str(x[_+1]))
			return True

程序输出：

I compare neighboring pairs!
Comparing  -3==-1 Operation Count:  1
Comparing  -1==2 Operation Count:  2
Comparing  2==4 Operation Count:  3
Comparing  4==4 Operation Count:  4
4=4
0:00:00.000059
[Finished in 1.7s]

记 $n$ 为比较（ $= =$ ）的次数。
比较silly的baseline方法用了192个单位时间。 $n = 16$
改进的方法用了59个单位时间. $n = 4$
where $n$ is the count of the operating steps

2.1.1 小实验（运行时间随规模增加的变化）

在最坏的情况下（把重复项放到数组x的末尾），两种查重算法的运行时间（ms）与数组 $x$ 的长度的关系。可见随着数组长度增加，silly方法的运行情况指数级恶化。
在这里插入图片描述
但是，在最好的情况下（数组中所有元素相同），二者的表现相差不大：
此时用x = [0]*N对数组x 进行初始化

测试并绘图使用的源代码如下：

# A silly method
def sillySearch(x):
	OperationCount = 0
	# print('I compare every possible pair!')
	lengthOfX = len(x)
	for _ in range(lengthOfX):
		currentA = x[_]
		for i in range(_+1,lengthOfX):
			OperationCount +=1
			# print("Comparing 
			if x[_]==x[i]:
				return True

# a little bit cleverer method
# only consider the neighboring duplication
def betterSearch(x):
	OperationCount = 0
	# print('I compare neighboring pairs!')
	lengthOfX = len(x)
	for _ in range(lengthOfX-1):
		currentA = x[_]
		OperationCount +=1
		if x[_]==x[_+1]:
			# print(str(x[_])+"="+str(x[_+1]))
			return True

import datetime
from matplotlib import pyplot as plt
# x = [-3,-1,2,4,4,8,12]

sillyRecorder = []
betterRecorder = []
#test silly
for N in range(1,1000,100):
	x = list(range(0,N-1))
	x.append(N-2)
	start = datetime.datetime.now()
	sillySearch(x)
	    # do something
	end = datetime.datetime.now()
	sillyRecorder.append((end-start).microseconds)
for N in range(1,1000,100):
	x = list(range(0,N-1))
	x.append(N-2)
	start = datetime.datetime.now()
	betterSearch(x)
	end = datetime.datetime.now()
	betterRecorder.append((end-start).microseconds)

plt.plot(sillyRecorder)
plt.plot(betterRecorder)
plt.legend(['silly','better'])
plt.show()
plt.xlabel('length of array')
plt.xlabel('Running Time (/ms)')
print(sillyRecorder)

2.2一般描述算法复杂性的方法

为了描绘一个算法的复杂性，需要建立一种同时具有简单（simple）和数学严谨性（mathematically rigious）的描述方法，使得上述两种算法的复杂度一目了然。首先来看一下常见的描述算法性能的方法，然后逐渐过渡到Big-Theta Notation - $\Theta$ 。

2.2.1使用python计时器进行精确评估

1：对python文件的运行时间进行计时

在终端中输入：
>> time python 文件名

2：对指定代码块的运行时间进行计时

import datetime
start = datetime.datetime.now()
代码块
end = datetime.datetime.now()
print (end-start)

2.2.2 计算代码中每一步的调用次数

考虑到算法规模为 $N$ 的情况(即待查找的list长度为 $N$ )，
对于比较“笨”的这种算法:

def sillySearch(x):
	lengthOfX = len(x)
	for _ in range(lengthOfX):
		currentA = x[_]
		for i in range(_+1,lengthOfX):
			OperationCount +=1
			if x[_]==x[i]:
				print(str(x[_])+"="+str(x[i]))
				return True

各个功能块的执行次数、最好的情况到最差的情况如下：

Operation	Count
range calls	2 to $N + 1$
len calls	2 to $N + 1$
_ assignments	1 to $N - 1$
j assignments	1 to $\frac{N^2-N}{2}$
equals(==)	1 to $\frac{N^2-N}{2}$
array access)	2 to $N^2-N$

对于比较聪明的算法:

# a little bit cleverer method
# only consider the neighboring duplication
def betterSearch(x):
	OperationCount = 0
	lengthOfX = len(x)
	for _ in range(lengthOfX):
		currentA = x[_]
		OperationCount +=1
		if x[_]==x[_+1]:
			return True