weixin_38146606-优快云博客

原创 Python 中文分词

1. 安装jieba 打开cmd easy_install jieba 2. 分词 3. 关键词提取

2017-04-16 15:50:40 381

原创常用正则表达式

1. 获取两特征之间的字符 begin(.+?)end (字符不为空，最小匹配)

2017-04-15 12:53:04 205

原创 Python 正则表达式

1. 基本文法 import re expression = re.compile(reg) result = re.findall(expression, text) 2. 常用正则表达式

2017-04-15 12:41:49 256

原创 Python 字典

如果你要对一组数据进行查找，请使用字典！真的好用！示例:统计每个id出现次数 count={} if id in count: count[id] += 1 else: count[id] = 1

2017-04-10 11:37:46 228

原创 Python 抓取页面

1. 一般方法 import urllib.request page = urllib.request.urlopen(url) html = page.read() 2. 模拟浏览器 hdr={'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23

2017-04-02 20:13:26 348

原创 Python 生成随机数

1. 随机整数 import random rand = random.randint(min,max)

2017-04-02 20:01:16 287

原创 Python 英文分词

1. 按空格/符号分词 pattern = r'''(?x) # set flag to allow verbose regexps ([A-Z]\.)+ # abbreviations, e.g. U.S.A. | \w+(-\w+)* # words with optional internal hyphens | \$?\d+(\

2017-04-02 10:45:53 6536

原创 Python 存储数据

1. 存数据 import pickle file=open(file_path,'wb') pickle.dump(data,file) file.close 2. 读数据 import pickle file=open(file_path,'rb') data=pickle.load(file) file.close

2017-03-31 20:59:08 371

原创 Python 读写文件

1. 打开文件 file=open(file_path,'type') 'r' - 读 'w' - 写 'b' - 二进制方式，可用于pickle 2. 读文件 text=file.readall() #读取全部内容 lines=file.readlines() #按行读取全部内容，lines为数组 line=file.readline() #逐行读取 while lin

2017-03-31 20:47:38 272

原创 Python 数组

1. 已知大小数组 arr=[0]*n #创建大小为n的一维数组 2. 未知大小数组 arr=[] #创建空数组 arr.append([0]) #在数组末尾添加元素

2017-03-31 19:51:31 338

weixin_38146606的博客