数学之路-python计算实战(4)-Lempel-Ziv压缩(2)

最新推荐文章于 2025-05-23 02:08:49 发布

转载最新推荐文章于 2025-05-23 02:08:49 发布 · 150 阅读

文章标签：

#python #数据库

本文介绍使用Python实现Lempel-Ziv压缩算法的过程，并通过实际案例展示了如何对文本文件进行压缩和解压缩。实验结果显示，该算法能有效减小文件大小。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Format characters have the following meaning; the conversion between C and Python values should be obvious given their types. The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of '<', '>', '!' or '='. When using native size, the size of the packed value is platform-dependent.

本博客所有内容是原创，假设转载请注明来源

http://blog.youkuaiyun.com/myhaspl/

Format	C Type	Python type	Standard size	Notes
`x`	pad byte	no value
`c`	`char`	string of length 1	1
`b`	`signed char`	integer	1	(3)
`B`	`unsigned char`	integer	1	(3)
`?`	`_Bool`	bool	1	(1)
`h`	`short`	integer	2	(3)
`H`	`unsigned short`	integer	2	(3)
`i`	`int`	integer	4	(3)
`I`	`unsigned int`	integer	4	(3)
`l`	`long`	integer	4	(3)
`L`	`unsigned long`	integer	4	(3)
`q`	`long long`	integer	8	(2), (3)
`Q`	`unsigned long long`	integer	8	(2), (3)
`f`	`float`	float	4	(4)
`d`	`double`	float	8	(4)
`s`	`char[]`	string
`p`	`char[]`	string
`P`	`void *`	integer		(5), (3)

struct. pack ( fmt, v1, v2, ... )

Return a string containing the values v1, v2, ... packed according to the given format. The arguments must match the values required by the format exactly.

truct. unpack ( fmt, string )

Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).

读文本文件并压缩以及解压，部分代码例如以下：

# -*- coding: utf-8 -*- 
#lempel-ziv算法
#code:myhaspl@myhaspl.com
import struct
mystr=""
print "\n读取源文件".decode("utf8")
mytextfile= open('test2.txt','r')
try:
     mystr=mytextfile.read( )
finally:
     mytextfile.close()
my_str=mystr
#码表
codeword_dictionary={}
#待压缩文本长度
str_len=len(my_str)
#码字最大长度
dict_maxlen=1
#将解析文本段的位置（下一次解析文本的起点）
now_index=0
#码表的最大索引
max_index=0

#压缩后数据
print "\n生成压缩数据中".decode("utf8") 
compresseddata=[]
while (now_index<str_len):    
    #向后移动步长
    mystep=0
    #当前匹配长度
    now_len=dict_maxlen
    if now_len>str_len-now_index:
        now_len=str_len-now_index
    #查找到的码表索引。0表示没有找到
    cw_addr=0   
    while (now_len>0):
        cw_index=codeword_dictionary.get(my_str[now_index:now_index+now_len])
        if cw_index!=None:
    		#找到码字
            cw_addr=cw_index
            mystep=now_len  
            break
        now_len-=1    
    if cw_addr==0:
        #没有找到码字,添加新的码字
        max_index+=1
        mystep=1
        codeword_dictionary[my_str[now_index:now_index+mystep]]=max_index
        print "don't find the Code word,add Code word:%s index:%d"%(my_str[now_index:now_index+mystep],max_index)
    else:
        #找到码字,添加新的码字
        max_index+=1    
        if now_index+mystep+1<=str_len:
            codeword_dictionary[my_str[now_index:now_index+mystep+1]]=max_index
            if mystep+1>dict_maxlen:
                dict_maxlen=mystep+1      
        print "find the Code word:%s  add Code word:%s index:%d"%(my_str[now_index:now_index+now_len],my_str[now_index:now_index+mystep+1],max_index)  
.......
......
        my_codeword_dictionary[my_maxindex]=my_codeword_dictionary[cwkey]+cwlaster        
        uncompressdata.append(my_codeword_dictionary[cwkey])
        uncompressdata.append(cwlaster)     
    print ".",
uncompress_str=uncompress_str.join(uncompressdata)
uncompressstr=uncompress_str
print "\n将解压结果写入文件里..\n".decode("utf8")
uncompress_file= open('uncompress.txt','w')
try:
    uncompress_file.write(uncompressstr)
    print "\n解压成功，已解压到uncompress.txt！

\n".decode("utf8") finally: uncompress_file.close()

以下对中文维基中对python的解释文本进行压缩：

调用该程序先压缩形成压缩文件，然后打开压缩文件解压

$ pypy lempel-ziv-compress.py python.txt python.lzv

………………..

find the Code word: C add Code word: CP index:9938

index:9939de word:ython add Code word:ython

find the Code word:

^ add Code word:

^ h index:9940

find the Code word:ttp add Code word:ttp: index:9941

find the Code word:// add Code word://e index:9942

find the Code word:dit add Code word:ditr index:9943

find the Code word:a. add Code word:a.o index:9944

生成压缩数据头部

将压缩数据写入压缩文件里

…………….

. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .

将解压结果写入文件里..

解压成功，已解压到uncompress.txt！

查看压缩效果：

$ ls -l -h

…………….

-rw-rw-r-- 1 deep deep 5.0K Jul 1 20:55 lempel-ziv-compress.py

-rw-rw-r-- 1 deep deep 30K Jul 1 20:55 python.lzv

-rw-rw-r-- 1 deep deep 36K Jul 1 20:57 python.txt

-rw-rw-r-- 1 deep deep 36K Jul 1 20:55 uncompress.txt从上面显示结果能够看到，没压缩前为36K，压缩后为30k

压缩sqlite 3.8.5的所有源代码

$ pypy lempel-ziv-compress.py sqlitesrc.txtsqlitesrc.lzv

查看压缩效果：