Opinion Spam Detection
Problem
- origin: Positive opinions often mean profits
and fames for businesses and individuals - lead to: promote or to discredit some target products, services, organizations, individuals, and even
- Difficultity: rampant and also sophisticated
- Meaningness: ensure that the social media continues to be a trusted source of public opinions
- 大众点评
- 旅游的经历(要求用户点评)
- 淘宝
- 刷单
Spam detection
- web spam
- link spam
- content spam
- mail spam
- opinion spam
- challenge: unlike other forms of spam, it is very hard, if not impossible, to recognize fake
opinions by manually reading them, which makes it difficult to find opinion spam data to help design and evaluate detection algorithms. For other forms of spam, one can recognize them fairly easily. - 人做不好的事情,能否交给机器去做?
role
- individual spammers
- group spammers
- highly damaging
- type1: a group of spammers works in collusion
- type2: single person use multiple user-ids to spam
Type
- fake reviews
- untruthful reviews that are written not based on the reviewers genuine experiences of using the products or services, but are written with hidden motives
- review about brands only
- These reviews do not comment on the specific products or services that they are supposed to review, but only comment on the brands or the manufacturers of the products
- “I hate HP. I never buy any of their products”
- non-reviews
- advertisements
- irrelevant texts containing no opinions(questions, answers, random texts)
- types 2 and 3 spam reviews are rare and relatively easy to detect using supervised learning(Jindal and Liu, 2008)
Unsupervised
difficulty of manually lableling of training data
1.Atypical Behavior
for product
1.1 different review patterns
- Each model gives a numeric spamming behavior score
- all scores are combined to produce the final spam score
- model
- Targeting products: a spammer will direct most of efforts on promoting or victimizing a few target products
- Targeting groups: spammers manipulating ratings of a set of products sharing some attrib