Overview
- Apriori is an algorithm mining frequent item sets and association rules in transactional databases.
- Apriori uses Breadth-First Search and a Hash Tree Structure to count candidate item sets efficiently.
- if
{X}
is a frequent item set, so all subsets of{X}
are frequent item sets. - if
{X}
is a infrequent item set, so all supersets of{X}
are infrequent item sets.
Preliminary
Breadth-First Search
Hash Tree Structure
Pseudo Code
- T means a tansactional database,
ϵ is the support threshold, which means the minimum times of occurrence of a item set. (T 表示我们的数据集,ϵ 表示的支持阈值,它表示在一个项集(item set)可以被称为频繁项集的最小出现次数). - L1←{large 1−itemsets} means the frequent item sets with only one item.
- Ck←{a∪{b}∣a∈Lk−1∧b∉a}−{c∣{s∣s⊆c∧|s|=k−1}⊈Lk−1} means the candidate set for level k.(
{a∪{b}∣a∈Lk−1∧b∉a} 表示的是在Lk−1这个频繁项集中,a和不包含a 的b的合集,{c∣{s∣s⊆c∧|s|=k−1}⊈Lk−1} 表示Ck中某些c包含有一些s ,这些s在Ck−1 中但是不在Fk−1中,减去这些集合意味着他们不可能是频繁项集,这是根据Overview中的第四条得到的) - Ct←{c | c∈Ck∧c⊆t,transactions t∈T} means the item sets exist in the database indeed.
- The following step is counting the time of occurrence for a certain item set and keep iteration.
Example
Itemsets |
---|
{1,2,3,4} |
{1,2,4} |
{1,2} |
{2,3,4} |
{2,3} |
{3,4} |
{2,4} |
Step 1
L1←{large 1−itemsets}, so we can obtain the following table:
Item | Support |
---|---|
{1} | 3 |
{2} | 6 |
{3} | 4 |
{4} | 5 |
Step 2
Ck←{a∪{b}∣a∈Lk−1∧b∉a}−{c∣{s∣s⊆c∧|s|=k−1}⊈Lk−1} k=2
Item | Support |
---|---|
{1,2} | 3 |
{1,3} | 1 |
{1,4} | 2 |
{2,3} | 3 |
{2,4} | 4 |
{3,4} | 3 |
So the frequent item sets (Lk k=2) are:
Item | Support |
---|---|
{1,2} | 3 |
{2,3} | 3 |
{2,4} | 4 |
{3,4} | 3 |
And the Prune sets which means any lager set that contains the prune sets cannot be frequent are :
Item | Support |
---|---|
{1,3} | 1 |
{1,4} | 2 |
Step 3
Ck←{a∪{b}∣a∈Lk−1∧b∉a}−{c∣{s∣s⊆c∧|s|=k−1}⊈Lk−1} k=3
First, {a∪{b}∣a∈Lk−1∧b∉a} is :
Item |
---|
{1,2,3} |
{2,3,4} |
{1,2,4} |
{1,2,3,4} |
And then, we need to cancel the items contain the Prune sets, we got:
Item | Support |
---|---|
{2,3,4} | 2 |
Finally, there is no frequent item in the Lk k=3, the calculation is over.