AI实战:从入门到精通系列——用全连接神经网络实现情感分类(一)
AI实战:从入门到精通系列——用全连接神经网络实现情感分类(二)
上一篇:AI实战:从入门到精通系列——用全连接神经网络实现情感分类(一)
用全连接神经网络实现情感分类
-
环境
Ubuntu16.04 LTS
python 3.x
numpy
gensim
jieba
pandas -
数据集
weibo_senti_100k 微博情感数据集
-
数据处理
-
数据加载
data_helper.pyimport numpy as np import pandas as pd import jieba import multiprocessing from gensim.models import Word2Vec from gensim.models.word2vec import LineSentence max_text_len = 50#50个词 word2vec_dimension = 100 def load_train_test_data(train_path, test_path, w2v_model_path): #加载数据 pd_all = pd.read_csv(train_path) train_data_x, train_data_y = pd_all.text.tolist(), pd_all.label.tolist() pd_all = pd.read_csv(test_path) test_data_x, test_data_y = pd_all.text.tolist(), pd_all.label.tolist() train_data_x = train_data_x[:5000]#数据太大了,只取5000个作为训练 train_data_y = train_data_y[:5000] #test_data_x = test_data_x[:500] #test_data_y = test_data_y[:500] #分词 train_data_x = [segment(k) for k in train_data_x] test_data_x = [segment(k) for k in test_data_x] #文本转向量表示 w2v_model = Word2Vec.load(w2v_model_path) train_data_x = text_to_word2vec(w2v_model, train_data_x) test_data_x = text_to_word2vec(w2v_model, test_data_x) return train_data_x, test_data_x, train_data_y, test_data_y
-
模型训练
train.pyimport os, sys import data_helper import fc_net import numpy as np from datetime import datetime def get_result(vec): max_value_index = 0 max_value = 0 for i in range(len(vec)): if vec[i] > max_value: max_value = vec[i] max_value_index = i return max_value_index def evaluate(network, test_data_set, test_labels): error = 0 total = len(test_data_set) for i in range(total): label = get_result(test_labels[i]) predict = get_result(network.predict(test_data_set[i])) if label != predict: error += 1 return float(error) / float(total) def now(): return datetime.now().strftime('%c') def to_sentence_vector(data): #求和,再求平均 data_vec = np.zeros((len(data), len(data[0][0]))) i = 0 while i < len(data): n = data[i].shape[0] data_vec[i] = np.sum(data[i], axis=0)/n i += 1 return data_vec def to_one_hot(label_list): label_vec = [] i = 0 while i < len(label_list): if label_list[i] == 1: label_vec.append([0.1, 0.9]) else: label_vec.append([0.9, 0.1]) i += 1 return label_vec def transpose(args): return list(map( lambda arg: list(map( lambda line: np.array(line).reshape(len(line), 1) , arg)) , args )) def train_and_evaluate(train_path, test_path, w2v_model_path): x_train, x_test, y_train, y_test = data_helper.load_train_test_data(train_path, test_path, w2v_model_path) x_train = to_sentence_vector(x_train) x_test = to_sentence_vector(x_test) y_train = to_one_hot(y_train) y_test = to_one_hot(y_test) x_train, x_test, y_train, y_test = transpose((x_train, x_test, y_train, y_test)) #1、句子的向量长度为100,隐藏层数量为1,节点数量为50,分类数量为2 network = fc_net.Network([100, 50, 2]) #2、句子的向量长度为100,隐藏层数量为5,节点数量为50,分类数量为2 #network = fc_net.Network([100, 50, 50, 50, 50, 50, 2]) last_error_ratio = 1.0 epoch = 0 while True: epoch += 1 network.train(y_train, x_train, 0.1, 1) #print('%s epoch %d finished' % (now(), epoch)) if epoch % 10 == 0: error_ratio = evaluate(network, x_test, y_test) print('%s after epoch %d, error ratio is %f' % (now(), epoch, error_ratio)) if error_ratio > last_error_ratio:#准确率不再上升则提前停止训练 break else: last_error_ratio = error_ratio if __name__ == "__main__": train_path = './data/train.csv' test_path = './data/test.csv' w2v_model_path = './data/word2vec.model' train_and_evaluate(train_path, test_path, w2v_model_path)
-
执行
python train.py
-
输出
1、句子的向量长度为100,隐藏层数量为1,节点数量为50,分类数量为2
network = fc_net.Network([100, 50, 2])Mon Sep 23 20:27:46 2019 after epoch 10, error ratio is 0.124594 Mon Sep 23 20:27:50 2019 after epoch 20, error ratio is 0.121677 Mon Sep 23 20:27:54 2019 after epoch 30, error ratio is 0.115593 Mon Sep 23 20:27:58 2019 after epoch 40, error ratio is 0.139012
结论:验证集上的准确率为88.45%,比AI实战:从入门到精通系列——用感知器实现情感分类(二)的感知器模型的81.89%要高一些。
2、句子的向量长度为100,隐藏层数量为5,节点数量为50,分类数量为2
network = fc_net.Network([100, 50, 50, 50, 50, 50, 2])Mon Sep 23 20:30:57 2019 after epoch 10, error ratio is 0.144929 Mon Sep 23 20:31:06 2019 after epoch 20, error ratio is 0.103842 Mon Sep 23 20:31:16 2019 after epoch 30, error ratio is 0.113509
结论:验证集上的准确率为89.62%,5层隐藏层比1层隐藏层的准确率要高一些,说明神经网络的深度起作用了,这也是 “深度学习” “深度” 的魅力所在!
相关
上一篇:
AI实战:从入门到精通系列——用全连接神经网络实现情感分类(一)
感知器:
AI实战:从入门到精通系列——用感知器实现情感分类(一)
AI实战:从入门到精通系列——用感知器实现情感分类(二)