
nlp
salt2020
THE PRICE
展开
-
dokcer挂载本地目录
冒号之前是本地目录(宿主机目录),冒号之后是容器目录,要写成绝对路径(就是以斜线开头),否则会报错:docker run -it --name ner_gyn -v /opt/wwwroot/atom_guoyanan/lstm_trail:/lstm_trail tensorflow/tensorflow:2.1.0-gpu-py3 /bin/bash安装特定版本的tensorflow-addons:pip install tensorflow-addons==0.9.1 (适配tensorflo原创 2020-06-10 20:01:02 · 251 阅读 · 0 评论 -
台大hw1-预测pm25-手动实现gradient descent
Homework 1 - PM2.5 Predictionimport numpy as np import pandas as pd import matplotlib.pyplot as pltimport matplotlib as mpl# 先考虑比较简单的一种模型:# 9+1=10 个feature,9小时内所有pm2.5,bias# 清洗train data# 将所有的pm25数据放入到一个list中#%%def train(): all_pm25 = []原创 2020-05-22 16:21:31 · 276 阅读 · 0 评论 -
手动使用gradient descent求解linear model
假设有以下数据样本:import matplotlib.pyplot as pltimport matplotlib as mplimport numpy as npx_data = [338, 333, 328, 207, 226, 25, 179, 60, 208, 606]y_data = [640, 633, 619, 393, 428, 27, 193, 66, 226, 1591]# ydata = b + w*xdataplt.scatter(x_data, y_data)p原创 2020-05-21 16:00:37 · 242 阅读 · 0 评论 -
AI数据
Large Movie Review Dataset:This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is addi原创 2020-05-15 14:44:43 · 300 阅读 · 0 评论 -
分离测评集数据
分离测评集数据是为了将原先混在一行的数据 变成一行是一行,方便自动化测评(至少半自动化)。先分离测评集数据,name和des分开放:fin = open("序列标注")fout1 = open("val_name", mode="w", encoding="utf8")fout2 = open("val_des", mode="w", encoding="utf8")for l in f...原创 2020-02-11 22:31:40 · 380 阅读 · 0 评论 -
召回耗时
一直以为相似度的计算耗费时间,没想到召回耗时占了大头:def all_entity_dict(): tag_sys_path = "tags_clean.txt" # tag_sys_path = "/workdir/data/tags_clean.txt" f = open(tag_sys_path,"r",encoding="utf8") # 对 merge_all...原创 2020-02-05 23:24:38 · 354 阅读 · 0 评论 -
基于gensim的lda实践2
import pandas as pdimport jiebafrom gensim.test.utils import common_textsfrom gensim.corpora.dictionary import Dictionaryfrom gensim.models.ldamodel import LdaModeldef jieba_add_words(): """...原创 2020-01-19 14:50:28 · 352 阅读 · 1 评论 -
根据词表获得预训练的字向量
根据词表获得预训练的字向量:import pickleimport tqdmimport numpy as npfrom nlutools import tools as nludef gene_embedding(): vocab_path = "./word2id.pkl" with open(vocab_path, 'rb') as f: word...原创 2020-01-13 15:31:30 · 455 阅读 · 0 评论 -
tensorflow保存模型和导入模型
基于tf 1.12.0版本保存模型:import tensorflow as tf import numpy as np## 保存模型W = tf.Variable([[1,2,3],[1,2,3]],dtype=tf.float32, name="weights")b = tf.Variable([[1,1,1]], dtype=tf.float32, name="biases")...原创 2020-01-13 11:13:27 · 248 阅读 · 0 评论 -
bilstm-crf
model.pyimport numpy as npimport os, time, sysimport tensorflow as tffrom tensorflow.contrib.rnn import LSTMCellfrom tensorflow.contrib.crf import crf_log_likelihoodfrom tensorflow.contrib.crf i...原创 2020-01-13 15:33:44 · 466 阅读 · 0 评论 -
gensim加载bin格式的词向量模型
filepath = "/opt/wwwroot/atom_guoyanan/data/vector2.0/fasttext.bin"model = gensim.models.fasttext.load_facebook_vectors(filepath)print(model['核'])[ 0.1335077 0.9915103 0.28807437 0.7358422 ...原创 2020-01-13 15:18:17 · 1603 阅读 · 0 评论 -
实体链接
假设已经从一段文本中找到了实体序列,接下来要将序列链接到某一实体。链接策略:计算序列和每个实体的tf-similarity,召回阈值大于0.5的实体(及别名)计算序列和实体的余弦相似度:0.5*simi(序列,实体) + 0.5*top_simi(序列,别名s)以上得分top1就是序列最终链接到的实体code:import logging, osfrom tqdm import ...原创 2020-01-06 17:38:54 · 553 阅读 · 0 评论 -
pd.read_parquet()报错
使用 pd.read_parquet() 时产生如下报错:$ python read_parquet.pyTraceback (most recent call last): File "read_parquet.py", line 3, in <module> df = pd.read_parquet('t1') File "/opt/userhome/atom_...原创 2019-10-16 17:02:48 · 5236 阅读 · 0 评论 -
将稠密矩阵转化为稀疏矩阵
import numpy as npfrom scipy import sparse# dense matrixA = np.array([[1,2,0],[0,0,3],[1,0,4]])# sparse matrixsA = sparse.csr_matrix(A)# print dense matrixprint(A)[[1 2 0][0 0 3][1 0 4]]...原创 2019-09-25 16:29:50 · 8281 阅读 · 0 评论 -
正则表达式
正则表达式的全部符号表示:符号描述\转义符。例如,‘n’ 匹配字符 “n”。’\n’ 匹配一个换行符。序列 ‘\’ 匹配 “” , “(” 则匹配 “(”。^匹配输入字符串的开始位置。$匹配输入字符串的结束位置。*匹配前面的子表达式零次或多次。例如,zo* 能匹配 “z” 以及 “zoo”。* 等价于{0,}。+匹配前面的子表达式一次或多次。...原创 2019-09-24 16:44:08 · 163 阅读 · 0 评论 -
基于gensim的lda实践
基于gensim的lda实践from gensim.test.utils import common_textsfrom gensim.corpora.dictionary import Dictionaryfrom gensim.models.ldamodel import LdaModel# Create a corpus from a list of textstexts = [...原创 2019-09-24 11:58:26 · 1073 阅读 · 1 评论