
NLP
文章平均质量分 77
macb007
这个作者很懒,什么都没留下…
展开
-
基于互信息+信息熵的新词发现
from nltk.probability import FreqDistf = open(r"C:\Users\machuanbin\Desktop\santi.txt",encoding='utf-8')text = f.read()stop_word = ['【', '】', ')', '(', '、', ',', '“', '”', '。', '\n', '《', '》', ' ...转载 2018-06-01 15:47:52 · 3014 阅读 · 0 评论 -
基于编辑距离的单词纠错算法
class Candidate(object): # WORDS_dict={word:freq} def __init__(self,WORDS_dict): self.WORDS=WORDS_dict def P(self,word): "Probability of `word`." # print(word,WORD...原创 2018-06-01 15:51:24 · 1712 阅读 · 0 评论 -
seq2seq英法翻译
'''# Data downloadEnglish to French sentence pairs.http://www.manythings.org/anki/fra-eng.zipLots of neat sentence pairs datasets can be found at:http://www.manythings.org/anki/# References- Se...转载 2018-06-19 17:25:42 · 1016 阅读 · 0 评论 -
LDA
#-*- coding:utf8 -*-from nltk.tokenize import RegexpTokenizerfrom stop_words import get_stop_wordsfrom nltk.stem.porter import PorterStemmerfrom gensim.models.ldamodel import LdaModelfrom gensim ...原创 2018-07-06 10:05:32 · 1189 阅读 · 0 评论