艾特小白鞋-优快云博客

原创数据挖掘（第九章关联规则分析）

1.1 Apriori算法（逐层搜素的迭代方法）、FP-growth 算法（频繁项集）、Ecalt算法（深度优先算法）。print("发现的规则数量:",len(rules))print("发现的频繁项集包括：\n",frequent_itemsets)#使用association_rules()函数生成强关联规则。#使用association_rules()函数生成强关联规则。print("生成的强关联规则为：\n",rules)print("生成的强关联规则为：\n",rules)

2025-02-19 18:20:26 658

原创数据挖掘（第八章聚类分析）

print("NMI指数：%0.3f" % metrics.normalized_mutual_info_score(y,y_pred))print('调整兰德指数AMI: %0.3f' % metrics.adjusted_rand_score(y, y_pred))print("调整兰德指数AMI：%0.3f" % metrics.adjusted_rand_score(y, y_pred))print('sse的值：',model.inertia_) #SSE值，质心的距离之和。

2025-01-21 06:52:36 587

原创数据挖掘（第七章集成技术）

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)#划分训练集和测试集。x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)#划分训练集和测试集。

2025-01-18 22:13:23 573

原创数据挖掘（第六章基础分类模型及回归模型）

x = df[['Family','Education','Securities Account','CD Account','Online','CreditCard']] #6个特征。ccol = ['Family','Education','Securities Account','CD Account','Online','CreditCard'] #6个特征。print("交叉验证计算KNN的准确度:%s"%(acc/kf.get_n_splits()))

2025-01-13 19:51:52 855

原创数据挖掘实战（特征选择）

print("L1正则化Logistic回归模型获取的测试精度:\n",selector.estimator_.score(x_test,y_test))print("决策树嵌入法获得的测试精度:\n",selector.estimator_.score(x_test,y_test))print("RFE方法选取特征所获得的测试精度:\n",rfe_selector.score(x_test,y_test))

2025-01-08 17:17:24 1212

原创数据挖掘实战（第四章数据预处理）

result1 = pd.merge(left,right,how='left',on=['A','B']) #左连接，主键的列名。df_median= df['数学'].fillna(value=df['数学'].median(),inplace=False)result2 = pd.merge(left,right,how='right',on=['A','B']) #右连接。df_mean= df['数学'].fillna(value=df['数学'].mean(),inplace=False)

2025-01-06 15:03:49 912

原创数据挖掘实战（第三章数据探索）

data = np.rint(features[2]) #四舍五入取整采用np.rint/np.trunc截取整数/np.ceil向上截取/np.floor向下截取。print(features.iloc[:,[2,3]].corr(method='pearson')) #服从正态分布。# 7.闵可夫斯基距离：欧氏距离p=2,曼哈顿距离p=1,切比雪夫距离p=无穷,是距离的综合定义，都受到特征量纲的影响。colors = ['#7FFFD4','#458B74','#FFE4C4'] #自定义颜色。

2025-01-03 19:42:03 655

原创数据挖掘实战（第二章 Python数据挖掘模块）

df1 = pd.DataFrame([['a',1,2],['b',3,4],['c',7,8]],columns=['x','y','z']) #二维列表创建DataFrame对象，columns列的索引。df1 = df.reindex(['李四','张三','王五','陈六'],columns=['数学','语文','英语','计算机'])s1 = pd.Series([0,1,2,np.nan],index = ['a','b','c','d']) #index索引。

2025-01-01 21:17:35 694 1