关联法则
关联法则的度量包括:
支持度(support),的含有该集合的交易占总交易的占比。元素集合出现要相对频繁。
置信度(confidence),出现A商品的同时,出现B商品的概率。存在强关系
提升度(lift),某一组合出现的概率是其中各个商品单独出现频率预期的倍数。出现频率超过偶然现象。
小票数据
数据导入
library(arules)
data("Groceries")
summary(Groceries)
inspect(some(Groceries,5))
结果输出:
items
[1] {UHT-milk,
domestic eggs,
brown bread,
coffee,
soda,
canned beer,
liquor (appetizer),
newspapers}
[2] {brown bread,
margarine}
[3] {fruit/vegetable juice,
waffles}
[4] {butter milk,
coffee}
[5] {frozen meals}
可以看到关联法则所针对的数据结构就是这种不是矩阵化的数据,其中,每个「{}」代表一条交易,每条包含的内容数量是有差异。
探索关联法则
ar <- apriori(Groceries,parameter = list(supp=0.01,conf=0.3,target = "rules"))# 根据具体的数据和业务需求调整supp和conf。
inspect(subset(ar,lift > 2.5))
结果输出:
#Apriori结果
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext
0.3 0.1 1 none FALSE TRUE 5 0.01 1 10 rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 98
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
sorting and recoding items ... [88 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [125 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].