Learning Notes of Dr.Bo Yuan.THU 《Data:Theory and Algorithm》Part I
- Definition:Data Mining is the process of automatically extracting interesting and useful hidden patterns from usually massive,incomplete and noisy data.
Not a fully automatically process.
From data to intelligence.
Data、information、knowledge、decision support
Classification
Algorithms:
Decision Tree、KNN、Neural Networks、SVM
Overfitting
Cross Validation Training data 、Test data
Confusion Matrix 、 TP(True Positive) 、FP(False Positive) 、FN(False Negative) 、TN(True Negative) 、TPR(True Positive Rate)、 TNR(True Negative Rate)、 Accuracy
TP+FP+FN+TN = number of samples
ROC:Receiver Operating Characteristic
AUC:Area Under ROC Curve #AUC near 1 is good
Cost sensitive learning
Lift analysisClustering
Difference:Clustering is Unsupervised Learning,Classification is Supervised Learning
Association RuleRegression
Underfitting
Overfitting- Data Preprocessing
Garbage Input garbage Output
Cloud Computing
Parallel Computing