python算法--置换选择排序详细实现

最新推荐文章于 2024-08-13 13:26:45 发布

原创最新推荐文章于 2024-08-13 13:26:45 发布 · 1.3k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#Python #算法 #外部排序 #置换-选择排序

数据结构、算法同时被 2 个专栏收录

6 篇文章

订阅专栏

python

6 篇文章

订阅专栏

本文介绍了一种名为置换-选择排序的方法，通过减少大文件分割成的小文件数量来提高排序效率。这种方法利用败者树结构在内存中筛选最小关键字元素，并通过不断替换和选择的方式生成较小的已排序文件。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在败者树一文中有提到，如果能一次性归并多个小文件，可以大大减少对文件的读写操作，从而减少 I/O 时间提高排序效率。那如果可以减少分割的小文件的数量呢？如果在不能一次性归并完所有小文件的情况下，如果能减少分割的小文件数量其实也是提高大文件排序的一种办法。

这正是这篇文章要介绍的——置换-选择排序。

过程如下：

假设内存工作区最多可容纳 n 条记录，则从大文件读取 n 条记录到内存工作区。筛选出最小关键字的元素，将其标记为 lastSmall, 输出到一个文件中(或者先存缓存区，等达到一定数量后一并写进文件)。然后再从大文件读取下一条记录到内存工作区，选取关键字大于 lastSmall 的最小值输出到 lastSmall 所在的文件（或缓存区）并将新的最小关键字元素赋值给 lastSmall 。重复这一动作。。。当在序列中选不出关键字小于 lastSmall 的时候，完成一次分割，生成一个小文件。重复上述动作，直到到达大文件结尾。

在内存工作区筛选最小关键字的元素的过程可以用败者树来实现。只是需要重新定义一下如何进行大小的比较——每个元素的比较不但需要关键字的比较，还需要更大优先级的段号的比较（段号大的为大，段号一样，关键字大的为大）。即当新的最小关键字元素的关键字小于 lastSmall 的关键字的时候，需要令该元素的段号递增一下，重新调整败者树。

完整代码：

#!/usr/bin/python
# Filename: ReplaceSelection.py


#---------------------------------Data Struct----------------------------------
class RSNode:
	'''The struct of the Replace_Selection method'''
	def __init__(self, rowNum, value):
		self.rowNum = rowNum
		self.value = value
#---------------------------------Loser Tree-----------------------------------

def createLoserTree(loserTree, dataArray, n): 
        for i in range(n):
                loserTree.append(0)
                dataArray.append(RSNode(1, i-n))

        for i in range(n):
                adjust(loserTree, dataArray, n, n-1-i)

def adjust(loserTree, dataArray, n, s): 
        t = (s + n) / 2 
        while t > 0:
		# rowNum has a higher Priority than value.
                if dataArray[s].rowNum > dataArray[loserTree[t]].rowNum:
                        s, loserTree[t] = loserTree[t], s
		elif dataArray[s].rowNum == dataArray[loserTree[t]].rowNum and dataArray[s].value > dataArray[loserTree[t]].value:
			s, loserTree[t] = loserTree[t], s
                t /= 2
        loserTree[0] = s
#-------------------------------------Use---------------------------------------

from time import ctime

# A method to write file.
def writeFile(tarDir, tmp):
	file_writer = open(tarDir, 'a+')
	file_writer.writelines(tmp)
	file_writer.close()
	# Clear the array tmp.
	while tmp:
		tmp.pop()

def splitFile(fileLocation, tarDirectory, n):
	file_reader = open(fileLocation, 'r')
	loserTree = []
	dataArray = []
	n = int(n)
	createLoserTree(loserTree, dataArray, n)
	line = file_reader.readline()
	# First, read file, fill the data array with front items of the file.
	for i in range(n):
		dataArray[i] = RSNode(1, line)
		# Adjust the loser tree after every change of the data array.
		adjust(loserTree, dataArray, n, i)
		line = file_reader.readline()
	lastRowNum = 1 # Used to name the new little files.
	lastSmall = dataArray[loserTree[0]] # lastSmall is a mark...
	tmp = [lastSmall.value] # You know, it's a temporary array to storage sorted ips.
	dataArray[loserTree[0]] = RSNode(lastRowNum, line)
	while True:
		# Write tmp into file if it's size reach the Maximum we defined.
		if len(tmp) == n:
			writeFile(tarDirectory + 'file' + str(lastRowNum) + '.txt', tmp)

		# Adjust the loser tree after every change of the data array.
		adjust(loserTree, dataArray, n, loserTree[0])
		
		# Finish one trip of search and finish one file.
		if dataArray[loserTree[0]].rowNum > lastRowNum:
			writeFile(tarDirectory + 'file' + str(lastRowNum) + '.txt', tmp)
			lastRowNum += 1

			lastSmall = dataArray[loserTree[0]]
			tmp.append(lastSmall.value)
			line = file_reader.readline()
			if line: # Reach the end of the file
				dataArray[loserTree[0]] = RSNode(lastRowNum, line)
			else:
				break
		else:
			# Can add new item into the tmp.
			if dataArray[loserTree[0]].value > lastSmall.value:
				lastSmall = dataArray[loserTree[0]]
				tmp.append(lastSmall.value)
				line = file_reader.readline()
				if line: # Reach the end of the file
					dataArray[loserTree[0]] = RSNode(lastRowNum, line)
				else:
					break
			else:
				# rowNum + 1 and return to adjust.
				dataArray[loserTree[0]].rowNum += 1

	# Don't forget to write the items in the loser tree into the file.
	dataArray[loserTree[0]] = RSNode(lastRowNum+10, 'F')
	while True: # This loop almost like the one above.
		if len(tmp) == n:
			writeFile(tarDirectory + 'file' + str(lastRowNum) + '.txt', tmp)

		adjust(loserTree, dataArray, n, loserTree[0])
		if dataArray[loserTree[0]].value == 'F':
			break

		if dataArray[loserTree[0]].rowNum > lastRowNum:
			writeFile(tarDirectory + 'file' + str(lastRowNum) + '.txt', tmp)
			lastRowNum += 1

			lastSmall = dataArray[loserTree[0]]
			tmp.append(lastSmall.value)

			dataArray[loserTree[0]] = RSNode(lastRowNum+10, 'F')
		else:
			if dataArray[loserTree[0]].value > lastSmall.value:
				lastSmall = dataArray[loserTree[0]]
				tmp.append(lastSmall.value)

				dataArray[loserTree[0]] = RSNode(lastRowNum+10, 'F')
			else:
				dataArray[loserTree[0]].rowNum += 1

	# And don't forget tmp. If tmp is not empty, write it into file.
	if tmp:
		writeFile(tarDirectory + 'file' + str(lastRowNum) + '.txt', tmp)

	file_reader.close()
#----------------------------Test-------------------------------------
if __name__ == '__main__':
	import sys
	from time import ctime
	try:
		fileLocation = sys.argv[1]
		tarDir = sys.argv[2]
		n = sys.argv[3]
	except:
		print 'Wrong Arguments!'
		print '''You neew 3 Parameters in total.
	1. The path of your file.
	2. The path of the target files.
	3. The size of the LoserTree.

You should do like this:
	python ReplaceSelect.py /root/hehe.txt /root/hehe/ 6'''
		sys.exit()
	timeNow = ctime()
	print 'Now the time is ' + str(timeNow) + ','
	print 'and the work is coming, please to wait...'
	splitFile(fileLocation, tarDir, n)
	print 'Work Over!'
	print 'Now the time is ' + str(ctime())