开始学着python,并为那个相似度开发包做准备.下面是一个小的测试程序:
#corpor file reader
#author: percylee
#time: 2006/08
class CorporFileReader:
"""reader for corpor file, which is labeled just like pku-corpor of
Renmin Ribao. e.g. 在/p 1998年/t 来临/v 之际/f ,/w ..."""
def __init__(self,fpath,splitstr = ' '):
"""need file path to init CorporFileReader"""
self.fpath = fpath
self.title = ''
self.document = ''
self.docwordlist = []
self.splitstr = splitstr
def __docTitle(self):
if self.docwordlist.__len__() <= 2:
return None
title = self.docwordlist[1]#for [0] is '199801-.../m'
for wordno in range(2,self.docwordlist.__len__()):
if self.docwordlist[wordno].find('/w') >= 0:
break
title = title + self.docwordlist[wordno] + ' '
return title
def read(self):
"""read title and document from corpor file"""
file = open(self.fpath)
self.document = file.read()
file.close()
self.docwordlist = self.document.split(self.splitstr)
self.title = self.__docTitle()
def docTitle(self):
"""get document title"""
return self.title
#test class
print '...in test...'
corporader = CorporFileReader('g://pyCode//pkucorpora1.txt')
print 'create one object of ' + corporader.__doc__
corporader.read()
print 'and read one document which/'s title is ' + corporader.docTitle()
输出为:
...in test...
create one object of reader for corpor file, which is labeled just like pku-corpor of
Renmin Ribao. e.g. 在/p 1998年/t 来临/v 之际/f ,/w ...
and read one document which's title is 迈向/v 充满/v 希望/n 的/u 新/a 世纪/n
非常简单.但python着实有趣;很久以来没有享受到这种学写程序的单纯的快乐了^_^.