Python数据挖掘-亲和性分析

本文介绍亲和性分析(关联分析)在商品推荐中的应用,通过计算支持度和置信度来评估购买某种商品后可能购买的其他商品。以苹果和奶酪为例,展示了如何使用Python进行数据分析,找出最有可能的购买组合。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

以下参考自《python数据挖掘入门与实战》

1.3亲和性分析

这个国内又叫做关联分析,根据样本个体之间相似度,确定他们关系的亲疏,适用于推荐

商品推荐

思路:人们经常同时购买两种商品,以后也会同时购买
规则:如果一个人购买了商品X,那么他很可能购买Y
规则优劣衡量的标准:支持度和置信度

支持度(support):规则应验次数,除以规则有效下总数量,衡量的是规则应验比例。
support = 同时购买{X,Y}的人数 / 总人数
{X,Y} 和{Y, X}的支持度相同

置信度(confidence)
规则的正确率。购买x的人,购买Y的概率,
confidence(X|Y) = 同时购买{X,Y}的人数 / 购买X的人数
confidence(Y|X) = 同时购买{Y,X}的人数 / 购买Y的人数

import numpy as np
dataset_filename = "affinity_dataset.txt"
data = np.loadtxt(dataset_filename)
n_samples, n_features = X.shape
print("This dataset has {0} samples, {1} features {2}".format(n_samples, n_features, 34))

print(data[:5])
# each row denotes a sample
# each column denotes a item, in this case, there are bread, milk, cheese, apple, banana respectively
# 1 denotes that the customer bought at least 1 product; 0 denotes that the customer did not buy the item.

features = ["break", "milk", "cheese","apple", "banana"]


Compute the support and confidence of the case that if a person purchase X, he may buy Y as well
support = the total number of people who buy both X and Y / the total number of samples
confidence(X|Y) = the total number of people who buy both X and Y / the total number of people who buy X

这上面这段也不是引用,就是我写的。。。

def get_num_XY_X(data,X,Y):
    num_xy = 0
    num_x = 0
    ix = features.index(X)
    iy = features.index(Y)
    for sample in data:
        if sample[ix] == 1:
            num_x += 1
            if sample[iy] == 1:
                num_xy += 1
    return num_xy, num_x

def compute_supp_conf(num_total, num_xy, num_x):
    support = num_xy / num_total
    confidence = num_xy / num_x
    return support, confidence

# rule: if a person buy an apple, he may buy milk
X, Y = "apple", "milk"
num_apple_milk, num_apple_purchase = get_num_XY_X(data, X, Y)
num_total = len(data)
s, c = compute_supp_conf(num_total, num_apple_milk, num_apple_purchase)
print("The support is {0:.1f}%, the confidence is {1:.1f}%".format(100*s,100*c))

# all rules
from collections import defaultdict
support_dict = defaultdict(float)
confidence_dict = defaultdict(float)
for i in range(4):
    X = features[i]
    for j in range(4):
        Y = features[j]
        if X == Y:
            continue
        num_XY, num_X = get_num_XY_X(data, X, Y)
        s,c = compute_supp_conf(num_total, num_XY, num_X)
        support_dict[(X,Y)] = s
        confidence_dict[(X,Y)] = c
        print("Rule: if a person buy {0}, he will also buy {1}".format(X,Y))
        print(" - support: {0:.3f}".format(s))
        print(" - confidence: {0:.3f}".format(c))

#Find the optimized rule
from operator import itemgetter
sorted_support = sorted(support_dict.items(),key=itemgetter(1), reverse=True)
sorted_confidence = sorted(confidence_dict.items(), key=itemgetter(1), reverse=True)

def print_top_five(sortlist,sdict,cdict):
    for i in range(5):
        rule = sortlist[i][0]
        print("Rule # %d: if a person buy %s, he may buy %s"%(i, rule[0],rule[1]))
        print("- support: %-10.3f"%sdict[rule])
        print("- confidence: %-.3f"%cdict[rule])

print_top_five(sorted_support, support_dict, confidence_dict) #按照支持度排名前五
print_top_five(sorted_confidence, support_dict, confidence_dict)  #按照置信度排名前五

最后打印的结果是这样滴

#print_top_five(sorted_support, support_dict, confidence_dict)结果
Rule # 0: if a person buy cheese, he may buy apple
- support: 0.250     
- confidence: 0.610
Rule # 1: if a person buy apple, he may buy cheese
- support: 0.250     
- confidence: 0.694
Rule # 2: if a person buy break, he may buy milk
- support: 0.140     
- confidence: 0.519
Rule # 3: if a person buy milk, he may buy break
- support: 0.140     
- confidence: 0.304
Rule # 4: if a person buy milk, he may buy apple
- support: 0.090     
- confidence: 0.196

#print_top_five(sorted_confidence, support_dict, confidence_dict)
Rule # 0: if a person buy apple, he may buy cheese
- support: 0.250     
- confidence: 0.694
Rule # 1: if a person buy cheese, he may buy apple
- support: 0.250     
- confidence: 0.610
Rule # 2: if a person buy break, he may buy milk
- support: 0.140     
- confidence: 0.519
Rule # 3: if a person buy milk, he may buy break
- support: 0.140     
- confidence: 0.304
Rule # 4: if a person buy apple, he may buy milk
- support: 0.090     
- confidence: 0.250

最后结论是
if a person buy apple, he may buy cheese
如果某人买了苹果,那么他很可能买奶酪

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值