X_train =>X_count_filter_train(CountVectorizer)
X_test =>X_count_filter_test(CountVectorizer)
X_train=>X_tdifdf_filter_train(TfidVectorizer)
X_test=>X_tdifdf_filter_test(TfidVectorizer)
X_count_filter_train=>(建立模型)y_train(CountVectorizer+贝叶斯分类器)
X_count_filter_test=>(预测)y_count_filter_predict(CountVectorizer+贝叶斯分类器)
X_tfidf_filter_train=>(建立模型)y_train(TfidfVectorizer+贝叶斯分类器)
X_tfidf_filter_test=>(预测)y_tfidf_predict(TfidfVectorizer+贝叶斯分类器)
X_test =>X_count_filter_test(CountVectorizer)
X_train=>X_tdifdf_filter_train(TfidVectorizer)
X_test=>X_tdifdf_filter_test(TfidVectorizer)
X_count_filter_train=>(建立模型)y_train(CountVectorizer+贝叶斯分类器)
X_count_filter_test=>(预测)y_count_filter_predict(CountVectorizer+贝叶斯分类器)
X_tfidf_filter_train=>(建立模型)y_train(TfidfVectorizer+贝叶斯分类器)
X_tfidf_filter_test=>(预测)y_tfidf_predict(TfidfVectorizer+贝叶斯分类器)
本文介绍使用CountVectorizer及TfidfVectorizer进行文本特征提取的方法,并结合贝叶斯分类器进行训练与预测的过程。具体包括从原始数据转换为特征向量,再利用这些向量训练模型并进行测试。
1644

被折叠的 条评论
为什么被折叠?



