域名过滤:从daminFile文件按行读取,将符合规则的域名行记录下来。
import os
import re
#input the file path
daminFile = raw_input('File path:\n')
#result file path
resultFile = daminFile+'_rs.txt'
fileRead = open(daminFile, 'r')
fileWrite = open(resultFile, 'w')
#set the rule
regex = ur'^[\w]{1,5}\.(com|net|org)$'
#get each line
for eachLine in fileRead:
print eachLine
if re.match(regex, eachLine):
fileWrite.write('%s' % eachLine)
fileWrite.close()
fileRead.close()
第一次将python用于实际需求:想注册一个域名,到最新过期的域名里挑,每天数十万的记录,人工挑选是不可能的,用python先挑选出长度5位以内,com/net/org 域名。
不算注释和空行,12行代码,python真的很省事。
----------------------------------分割线---------------------------------------
去除项目目中中.svn文件 :
import os,sys,stat,shutil
dirPath = ''
def walk(path):
for fileobj in os.listdir(path):
if fileobj == '.svn':
shutil.rmtree(os.path.join(path, fileobj))
else:
filepath = path+'//'+fileobj
isDir = os.path.isdir(filepath)
if isDir:
print filepath
walk(filepath)
def getInput():
dirPath = raw_input('Enter the dir path(eg:"e:/test"):\n')
if os.path.isdir(dirPath):
walk(dirPath)
else:
print 'there is not a directory named ' + dirPath
getInput()
if __name__=='__main__':
getInput()