Exploring the Power of Links in Data Mining-韩家炜演讲摘录

韩家炜教授分享了他在数据挖掘领域的最新研究进展,包括利用链接进行分类、用户引导聚类、链接聚类及对象区分分析等四项工作。这些方法在多种任务中展示了优秀的效果。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 韩家炜(Jiawei Han),数据挖掘的泰斗级人物,大名如雷贯耳,今日有幸能一睹真人风采。见面第一感觉居然是此人年轻时肯定是个帅哥(汗!),当然,现在仍然是个精神矍铄的智者。

   演讲的主题是:Exploring the Power of Links in Data Mining。报告主要讲了四篇论文,都是他的博士研究生Xiaoxin Yin完成。这些工作,大多是受到PageRank算法HITS等的影响导出的。利用数据间的连接关系,我们可以更有效的得出我们所关注的信息。这四篇论文提出的算法,在与其他相关算法的比较中,均显示出了较强的优越性。

   1.CrossMine:在连接传播过程中,采用的是有控制的传播,有些比较弱的连接不考虑,这样,能在很好保持准确率的情况下,大大提高时间效率。在Relation少的时候,这种优势不明显,但当Relation多时,显示了强大的优越性。

   2.User-Guided Clustering:类似于半监督的学习,用户提供认为重要的特征,然后再分类。这里把整个feature的一列作为特征考虑。而这个提供的特征只是作为soft hint,作为一种参考,我们还需要考虑其它的因素。

   3.LinkClus:可以通过人们发的paper,找出各个会议间的相关性。同一个author发的不同会议间的联系强。原有的算法时间效率很差,这里利用了power law distribution of links。找出密集的links,因为密集的links比较少,所以只分析这些会有很大的效率提高。同时,绝大多数的性息被包含在这些密集的links中了,所以准确率也很好。

   4.同名人发的paper怎么区分?特别是中国人,名称翻译成英文后,重名的很多,如王伟,有14个之多,如何区分他们,成了问题。这边用到了论文中合作者的信息(共同作者),首先训练的是那些很难重名的人,作为clean data。从他们出发,分类其它的。

    最后讲了Xiaoxin Yin最近的研究方向:辨别网页上信息的真假。利用的是这样一个假设,真的信息只有一个,假的信息千变万化。

    最后,再次向牛人致敬!

    贴一下讲座的摘要,以及韩老的简历:

ABSTRACT
Algorithms like PageRank and HITS have been developed in late 1990s to
explore links among Web pages to discover authoritative pages and hubs.
Links have also been popularly used in citation analysis and social network
analysis.  We show that the power of links can be explored thoroughly at
data mining in classification, clustering, information integration, and
other interesting tasks.  Some recent results of our research that explore
the crucial information hidden in links will be introduced, including (1)
multi-relational classification, (2) user-guided clustering, (3) link-based
clustering, and (4) object distinction analysis.  The power of links in
other analysis tasks will also be discussed in the talk.
------------------------
Short bio:
Jiawei Han, Professor, Department of Computer Science, University of
Illinois at Urbana-Champaign.  He has been working on research into data
mining, data warehousing, database systems, data mining from spatiotemporal
data, multimedia data, stream and RFID data, Web data, social network data,
and biological data, with over 300 journal and conference publications.  He
has chaired or served on over 100 program committees of international
conferences and workshops, including PC co-chair of 2005 (IEEE)
International Conference on Data Mining (ICDM), Americas Coordinator of
2006 International Conference on Very Large Data Bases (VLDB).  He is also
serving as the founding Editor-In-Chief of ACM Transactions on Knowledge
Discovery from Data.  He is an ACM Fellow and has received 2004 ACM SIGKDD
Innovations Award and 2005 IEEE Computer Society Technical Achievement
Award. His book "Data Mining: Concepts and Techniques" (2nd ed., Morgan
Kaufmann, 2006) has been popularly used as a textbook worldwide.

韩老的Home page:

http://www-faculty.cs.uiuc.edu/~hanj/

讲的四篇paper:

CrossMine: Efficient Classification from Multiple Heterogeneous Databases

Cross-Relational Clustering with User's Guidance

LinkClus: Efficient Clustering via Heterogeneous Semantic Links

Object Distinction: Distinguishing Objects with Identical Names by Link Analysis

他作的另一个演讲记录:

http://users.ir-lab.org/~bill_lang/blog10/archives/001166.html

 
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值