垃圾评论检测

Opinion Spam Detection

Problem
  • origin: Positive opinions often mean profits
    and fames for businesses and individuals
  • lead to: promote or to discredit some target products, services, organizations, individuals, and even
  • Difficultity: rampant and also sophisticated
  • Meaningness: ensure that the social media continues to be a trusted source of public opinions
  • 大众点评
    • 旅游的经历(要求用户点评)
  • 淘宝
    • 刷单
Spam detection
  • web spam
    • link spam
    • content spam
  • mail spam
  • opinion spam
  • challenge: unlike other forms of spam, it is very hard, if not impossible, to recognize fake
    opinions by manually reading them, which makes it difficult to find opinion spam data to help design and evaluate detection algorithms. For other forms of spam, one can recognize them fairly easily.
  • 人做不好的事情,能否交给机器去做?
role
  • individual spammers
  • group spammers
    • highly damaging
    • type1: a group of spammers works in collusion
    • type2: single person use multiple user-ids to spam
Type
  • fake reviews
    • untruthful reviews that are written not based on the reviewers genuine experiences of using the products or services, but are written with hidden motives
  • review about brands only
    • These reviews do not comment on the specific products or services that they are supposed to review, but only comment on the brands or the manufacturers of the products
    • “I hate HP. I never buy any of their products”
  • non-reviews
    • advertisements
    • irrelevant texts containing no opinions(questions, answers, random texts)
  • types 2 and 3 spam reviews are rare and relatively easy to detect using supervised learning(Jindal and Liu, 2008)

Unsupervised

difficulty of manually lableling of training data

1.Atypical Behavior

for product

1.1 different review patterns
  • Each model gives a numeric spamming behavior score
  • all scores are combined to produce the final spam score
  • model
    • Targeting products: a spammer will direct most of efforts on promoting or victimizing a few target products
    • Targeting groups: spammers manipulating ratings of a set of products sharing some attrib
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值