Sentiment Classification调研

最新推荐文章于 2024-06-13 09:45:46 发布

原创最新推荐文章于 2024-06-13 09:45:46 发布 · 5.9k 阅读

·

3

·

CC 4.0 BY-SA版权

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

文章标签：

#classification #algorithm #hyper #training #plot #methods

NLP/IR 同时被 2 个专栏收录

20 篇文章

订阅专栏

6 篇文章

订阅专栏

本文概述了近期Sentiment Classification领域的关键研究，详细介绍了基础算法如朴素贝叶斯、最大熵模型和支持向量机，以及考虑句子级别特征的重要性。通过整合句子分类与语言模型，以及对句子位置权重的利用，系统的准确性得到了显著提升。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Sentiment Classification调研

这是近期对Sentiment Classification重要论文的初步调研，只涉及了几篇论文，总结的都是比较基础通用的方法，主要是基于Pang Bo的相关研究工作总结，下面是自己总结的大纲，是英文版的。

Baseline Algorithm

• Produce a list of sentiment words byintrospection and rely on them alone to classify the texts

Baseline Algorithm

• Algorithm used in Minghui’s paperto predict the polarity of user interaction

• Combine three popular lexicons to get an English lexicon

• Check whether there are more positivesentiment words or more negative sentiment words in the expression

MachineLearning Methods

• Naïve Bayes

• Maximum Entropy Model

• Support Vector Machines

• Need labeled data

– http://www.cs.cornell.edu/People/pabo/movie-review-data/

Naïve Bayes

• To assign to a given Document d the Class C*

• Bayes’ rule

• NB Classifier

• Simple but have high predictive power

MaximumEntropy Model

• Estimate of P(c|d) takes the following exponential form

• F_i,cis afeature/class functionforfeature f_i and class c, defined as follows

• Toolkit: Zhang Le's (2004) Package Maximum EntropyModeling Toolkit for Python and C++

SupportVector Machines

• Given a set of training data, the SVMclassifier finds the hyper plane such that each training point is correctlyclassified and the hyper plane is as far as possible from the points closest toit

• Toolkit: SVM^light,LibSVM, PyML

Considerationabout Features

• Unigrams vs. Bigrams vs. Both

• Feature frequency vs. Presence

• POS tags

• Position of words

• Negation words

• TF-IDF

• Adjectives and verbs

Pang’s Result

MajorityVoting

• Combining Bayes, MaxEnt, and SVMclassifiers over the same data provided a three to four percents boost over thebest of the individual classifiers alone.

Integratinga Sentence Classifier with Language model

• Run on each sentence of the review toobtain a decision of “positive” or “negative”.

• Then the sentences are used to “vote” thereview as negative or positive on the basis of their probability scores.

• After that the decisions of voting fromthe sentence classifier and the review classifier are combined together usingoptimum weights.

PerformanceImprovement with Information from Sentence Level

The Sentence Classifier

• Also based on Naïve Bayes Algorithm

• Run on the 5331 negative and 5331 positive sentences

Vote toClassify Reviews

• The majority vote is used to classify thereview.

• Using the actual scores of positivity andnegativity computed for the sentence gets a considerable 1.9% improvement inperformance.

WeightingSentences by Positions

• Weighting the sentence by their positionin the review .

• Providing more weights to sentences thatare towards the beginning and end of a review.

Incorporate“Focus” of a Sentence

• “Focus” here means whether a givensentence is talking about a movie or not.

• Two approaches:

– Check if asentence contains words such as this movie, the plot, the actors, etc.

– Check if a sentence consists of a movie name.

Discussion

• Using classification of sentences inaddition to the Naïve Bayes model increased accuracy of the system by 5.5% .

• Use other algorithms like MaxEnt, or SVMfor the sentence classifier.

• Divide the review into paragraphs andprovide different weights to different paragraphs according to positions.

References

• 1. Bo Pang, Lillian Lee, and ShivakumarVaithyanathan. 2002. Thumbs up? sentiment classification using machine learningtechniques. (Proceedings of EMNLP).

• 2. Soo-Min Kim etal. 2004. Determining the Sentiment ofOpinions. (Proceedings of COLING).

• 3. Sunil Khanal . 2010. Sentiment Classification using Language Models andSentence Position Information . (Stanford CS 224N final project)

• 4. Michel Galley etal.2004.Identifying Agreement and Disagreement in Conversational Speech: Use ofBayesian Networks to Model Pragmatic Dependencies. (Proceedings of ACL)

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。