数据挖掘笔记 第一章:引言

数据挖掘概览
本文介绍了数据挖掘的概念及其重要性,并详细阐述了数据挖掘的过程步骤,包括应用领域选择、数据预处理、特征选择、挖掘方法及算法的选择等。此外,还讨论了数据挖掘的功能性和适用的数据类型。
部署运行你感兴趣的模型镜像

教科书:数据挖掘:概念与技术(第二版),Jiawei Han和Micheline Kamber 著,机械工业出版社(2007)

 

Lecture 1: Introduction

1)  Why data mining?

Necessity Is the Mother of Invention需要是发明之母

 

2) What is data mining?

Data mining (knowledge discovery from data从大量数据中提取或挖掘知识)

Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data从大量的数据中挖掘哪些令人感兴趣的、有用的、隐含的、先前未知的和可能有用的模式或知识

Alternative names: Knowledge discovery (mining) in databases (KDD) 数据库中的知识挖掘

 

Steps of a KDD Process

Learning the application domain: relevant prior knowledge and goals of application

Creating a target data set: data selection

Data cleaning and preprocessing: (may take 60% of effort!)

Data reduction and transformation:Find useful features, dimensionality/variable reduction, invariant representation

Choosing functions of data mining: summarization, classification, regression, association, clustering

Choosing the mining algorithm(s)

Data mining: search for patterns of interest

Pattern evaluation and knowledge presentation: visualization, transformation, removing redundant patterns, etc.

Use of discovered knowledge

Architecture: Typical Data Mining System

 

3) On what kind of data?

Traditional database and appllications

    Relational database, data warehouse, transactional database关系数据库,数据仓库,事务数据库

Advanced database and advanced applications

   Object-relational databases对象-关系数据库

   Temporal database, sequence data (incl. biosequences), time-series data时间数据库、序列数据库和时间序列数据库

    Spatial database and spatiotemporal database空间数据库和时间空间数据库

    Text databases Multimedia database文本数据库和多媒体数据库

    Heterogeneous databases and legacy databases异构数据库和遗产数据库

    Data streams and sensor data数据流和传感器数据

    Structure data, graphs, social networks and link databases

    Text databases Multimedia database文本数据库和多媒体数据库

    The World-Wide Web万维网

 

4) Data Mining Functionalities

   Lass/concept description: Characterization and discrimination 类/概念描述: 特性化和区分

   Frequent patterns, association, correlation and causality频繁模式、关联和相关

   Classification and prediction分类和预测 

   Cluster analysis聚类分析

   Outlier analysis离群点分析

   Trend and evolution analysis趋势和演变分析

 

5) Are all the patterns interesting?

 

6) Classification of data mining systems

您可能感兴趣的与本文相关的镜像

Anything-LLM

Anything-LLM

AI应用

AnythingLLM是一个全栈应用程序,可以使用商用或开源的LLM/嵌入器/语义向量数据库模型,帮助用户在本地或云端搭建个性化的聊天机器人系统,且无需复杂设置

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值