HNUST 数据挖掘课设 《实验二 Close 算法设计与应用》

HNUST 数据挖掘课设 《实验二 Close 算法设计与应用》

一、实验内容

1.实验要求

在这里插入图片描述

2. 实验原理

一个频繁闭合项目集的所有闭合子集一定是频繁的;一个非频繁闭合项目集的所有闭合超集一定是非频繁的。因此可以在闭合项目集格空间上讨论项目集的频繁问题。实验证明,它对特殊数据是可以减少数据库扫描次数的。Close算法是一种用于频繁项集挖掘的算法,其主要目的是发现数据集中的闭合频繁项集,通过发现闭合频繁项集,能够避免生成大量不必要的候选项集,减少后续关联规则挖掘的计算复杂度,节省计算资源和时间,从而提高挖掘效率。

3.程序流程图

在这里插入图片描述

图2-1 Close算法

二、数据结构与关键代码

1. 数据结构
List<List<String>> transactions //每条记录
Map<Set<String>, Integer> FrequentItemsets //所有频繁项目集
Map<Set<String>, Integer> frequentItemsets //
Map<Set<String>, Set<String>> Closures //每个Generator及closure
Map<Set<String>,Integer> closures //闭合项及支持度
2. 关键代码
//计算各个产生式的闭合
public static Set<String> calculateClosure(Set<String> item, List<List<String>> transactions) {
    Set<String> currentClosure = null;
    // 遍历每个事务
    for (List<String> transaction : transactions) {
        // 如果事务包含项(item),则将事务中的所有项添加到当前闭包中
        if (transaction.containsAll(item)) {
            if (currentClosure == null) {
                currentClosure = new HashSet<>(transaction);
            } else {
                currentClosure.retainAll(transaction);
            }
        }
    }
    return currentClosure != null ? currentClosure : new HashSet<>();
}
//通过频繁闭合项目集得到频繁项目集
 while (maxLength >1) {
            Iterator<Set<String>> iterator = closures.keySet().iterator();
            while (iterator.hasNext()) {
                Set<String> key = iterator.next();
                if (key.size() == maxLength) {
                    List<Set<String>> subSetsk = generateKsubsets(key);
                    System.out.println(key+"======>SUB "+subSetsk);
                    for (Set<String> sub : subSetsk) {
                        if (!closures.containsKey(sub)) {
      System.out.println(key + " newAdd => " + sub+"=>"+closures.get(key));
                            endFrequentItemsets.add(sub);
                            FrequentItemsets.put(sub,closures.get(key));
                            medium.put(sub,closures.get(key));
                            System.out.println("medium=>"+medium);
                        }
                    }
                }
            }
            for (Set<String> newClosure:medium.keySet()){
                closures.put(newClosure,medium.get(newClosure));
            }
            maxLength--;
        }
3. 完整代码

采用文件读取的方式从dataset.txt中读入数据,当时让我觉得比较难的点是剪枝,代码是好久以前写的了哈哈哈
在这里插入图片描述

(1)找出候选1-项目集
(2)扫描数据库得到候选闭合项目集
(3)修剪,将支持度小于最小支持度的候选闭合项删除
(4)得到频繁闭合项目集,与自身连接得到频繁候选i-项目集,如此继续下去,直到某个值r使得候选频繁闭合r-项目集为空,这时算法结束。
(5)通过频繁闭合项目集得到频繁项目集。首先对FC中的每个闭合项目集,计算它的项目个数,把所有项目个数相同的归入中,同时得到最大的个数计为k,然后从k开始,对每个中的所有项目集进行分类,找到它的所有(i-1)-项子集。然后对于每个子集,如果它不属于则把它加入,直到i=2,就找到了所有的频繁项目集。
(6)挖掘关联规则


import javafx.scene.effect.SepiaTone;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;

/**
 * Created by 23222 on 2023/12/18.
 */
public class Close {
//    public static void main(String )
    public static void main(String[] args){
                    double minSupport = 0.6;
                    double minConfidence=0.6;
//            Scanner scanner = new Scanner(System.in);
//
//            System.out.print("Enter the minSupport: ");
//            double minSupport = scanner.nextDouble();
//
//            System.out.print("Enter the minConfidence: ");
//            double minConfidence = scanner.nextDouble();
//
            // 读取事务数据库
            String filename = "../2-Close/dataset.txt";
            List<List<String>> transactions = readTransactions(filename);
            for (List<String> transaction : transactions) {
                System.out.println(transaction);
            }

            // 生成频繁项目集
            Map<Set<String>, Integer> FrequentItemsets = generateFrequentItemsets(transactions, minSupport);
            System.out.println(FrequentItemsets.entrySet());

            // 生成关联规则
           generateAssociationRules(minConfidence,FrequentItemsets);


        }

    public static List<List<String>> readTransactions(String filename){
            List<List<String>> transactions = new ArrayList<>();
            try (BufferedReader br = new BufferedReader(new FileReader(filename))) {
                String line;
                boolean firstLine = true;
                while ((line = br.readLine()) != null) {
                    if (firstLine) {
                        firstLine = false;
                        continue;
                    }

                    String[] parts = line.split("\\s+");
                    String[] items = parts[1].split("、");
                    transactions.add(Arrays.asList(items));
                }
            } catch (IOException e) {
                e.printStackTrace();
            }

            return transactions;
        }

    public static Map<Set<String>, Integer> generateFrequentItemsets(List<List<String>> transactions, double minSupport) {
            Map<Set<String>, Integer> FrequentItemsets = new HashMap<>();
            Map<Set<String>, Integer> frequentItemsets = new HashMap<>();
            Map<Set<String>, Set<String>> Closures = new HashMap<>();
            Map<Set<String>,Integer> closures = new HashMap<>();

            // 获取长度为1的项并计算初始支持度
            for (List<String> transaction : transactions) {
                for (String item : transaction) {
                    Set<String> candidate = new HashSet<>();
                    candidate.add(item);
                    frequentItemsets.put(candidate,frequentItemsets.getOrDefault(candidate, 0) + 1);
                }
            }

            System.out.println("-------------------FCC1--------------------------------");
            for (Map.Entry<Set<String>, Integer> entry : frequentItemsets.entrySet()) {
                Set<String> itemset = entry.getKey();
                Closures.put(itemset,calculateClosure(itemset,transactions));
                int support = entry.getValue();
                System.out.println(itemset + " => " + support);
                System.out.println(calculateClosure(itemset,transactions));
                System.out.println();
            }
            System.out.println("--------------------------------------------------------");


            // 保留支持度不小于最小支持度的1-项目集
            frequentItemsets.keySet().removeIf(itemset -> {
//                double support = (double) frequentItemsets.get(itemset);
                double support = (double) frequentItemsets.get(itemset) / transactions.size();
                if (support < minSupport) {
                    return true;
                }
                return false;
            });

            // 获取 frequentItemsets 中的键集合
            Set<Set<String>> frequentKeys = frequentItemsets.keySet();

            // 交集retainall()
            Closures.keySet().retainAll(frequentKeys);


            // 遍历输出
            for (Map.Entry<Set<String>, Set<String>> entry : Closures.entrySet()) {
                Set<String> key = entry.getKey();
                Set<String> value = entry.getValue();
                Closures.put(key,value);
                System.out.println("Key: " + key);
                System.out.println("Value: " + value);
                System.out.println("------------");
            }

             Map<Set<String>, Integer> ffrequentItemsets = new HashMap<>(frequentItemsets);



        System.out.println("-------------------FC1--------------------------------");
            for (Map.Entry<Set<String>, Integer> entry : frequentItemsets.entrySet()) {
                Set<String> itemset = entry.getKey();
                closures.put(Closures.get(itemset),entry.getValue());
                int support = entry.getValue();
                System.out.println(itemset + " => " + support);
            }
            System.out.println("--------------------------------------------------------");

            int k = 2;
            while (!ffrequentItemsets.isEmpty()) {
                Map<Set<String>, Integer> candidateItemsets = generateFCC(ffrequentItemsets.keySet(), k);

                ffrequentItemsets.clear();

                System.out.println("------------------" + k + "-connection---------------------------");
                if (!candidateItemsets.isEmpty()) {
                    // 计算k-候选项目集的支持度
                    for (List<String> transaction : transactions) {
                        for (Set<String> itemset : candidateItemsets.keySet()) {
                            if (transaction.containsAll(itemset)) {
                                candidateItemsets.put(itemset, candidateItemsets.get(itemset) + 1);
                            }
                        }
                    }
                    for (Map.Entry<Set<String>, Integer> entry : candidateItemsets.entrySet()) {
                        Set<String> itemset = entry.getKey();
                        int support = entry.getValue();
                        System.out.println(itemset + " => " + support);

                    }
                } else System.out.println("【空】");
                System.out.println("-------------------------------------------------------");


                System.out.println("-----------------------------筛选-----------------------");
                Iterator<Map.Entry<Set<String>, Integer>> iterator = candidateItemsets.entrySet().iterator();
                while (iterator.hasNext()) {
                    Map.Entry<Set<String>, Integer> entry = iterator.next();
                    Set<String> itemset = entry.getKey();
                    int flag = 0;
                    List<Set<String>> Sp = generateKsubsets(itemset);
                    System.out.println("itemset=> "+itemset);
                    for (Set<String> sp : Sp) {
                        Set<String> closure = Closures.get(sp);
                        System.out.println(sp+"的闭包"+closure);
                        if (closure.containsAll(itemset)) {
                            iterator.remove(); // 使用迭代器安全删除
                            flag = 1;
                            break;
                        }
                    }
                    if (flag == 1) System.out.println("----------------------------------->"+itemset+"=>DELETE!");
                }


                for (Map.Entry<Set<String>, Integer> entry : candidateItemsets.entrySet()) {
                    Set<String> itemset = entry.getKey();
                    int support = entry.getValue();
                    System.out.println(itemset + " => " + support);

                }


                System.out.println("计算各产生式的闭合和支持度");
                for (Map.Entry<Set<String>, Integer> entry : candidateItemsets.entrySet()) {
                    Set<String> itemset = entry.getKey();
                    Closures.put(itemset,calculateClosure(itemset,transactions));
                    int support = entry.getValue();
                    System.out.println(itemset + " => " + support+"     "+calculateClosure(itemset,transactions));

                }


                candidateItemsets.keySet().removeIf(itemset -> {
                    int count = candidateItemsets.get(itemset);
                    double support = (double) count / transactions.size();
                    if (count==0 ) {
                        return true;
                    }
                    return false;
                });

                System.out.println("修剪");
                candidateItemsets.keySet().removeIf(itemset -> {
                    int count = candidateItemsets.get(itemset);
                    double support = (double) count / transactions.size();
                    if (support<minSupport) {
                        return true;
                    }
                    return false;
                });



                Closures.clear();
                for (Map.Entry<Set<String>, Integer> entry : candidateItemsets.entrySet()) {
                    Set<String> itemset = entry.getKey();
                    Closures.put(itemset,calculateClosure(itemset,transactions));
                    closures.put(itemset,candidateItemsets.get(itemset));
                    int support = entry.getValue();
                    System.out.println(itemset + " => " + support+"     "+calculateClosure(itemset,transactions));

                }


                frequentItemsets.putAll(candidateItemsets);
                ffrequentItemsets.putAll(candidateItemsets);
           /* for (Set<String> itemset : frequentItemsets.keySet()) {
                int support = frequentItemsets.get(itemset);
                System.out.println("f-频繁项集: " + itemset + ", 支持度: " + support);
            }*/


                System.out.println("-------------------FC " + k + "------------------------------");
                if (!candidateItemsets.isEmpty()) {
                    for (Map.Entry<Set<String>, Integer> entry : ffrequentItemsets.entrySet()) {
                        Set<String> itemset = entry.getKey();
                        int support = entry.getValue();
                        System.out.println(itemset + " => " + support);
                    }
                } else System.out.println("【空】");
                System.out.println("--------------------------------------------------------");
                k++;
            }
        // 输出频繁项目集
        List<Map.Entry<Set<String>, Integer>> sortedclosures = new ArrayList<>(closures.entrySet());

        Collections.sort(sortedclosures, new Comparator<Map.Entry<Set<String>, Integer>>() {
            @Override
            public int compare(Map.Entry<Set<String>, Integer> entry1, Map.Entry<Set<String>, Integer> entry2) {
                int length = Integer.compare(entry1.getKey().size(), entry2.getKey().size());
                if (length != 0) {
                    return length; // 先按长度排序
                } else {
                    return entry1.getKey().hashCode() - entry2.getKey().hashCode(); // 长度相同时按哈希值排序
                }
            }
        });

        for (Map.Entry<Set<String>, Integer> entry : sortedclosures) {
            ffrequentItemsets.put(entry.getKey(), entry.getValue());
        }

        int maxLength=0;
        System.out.println("-------------------【所有的闭合集】-----------------");
        for (Map.Entry<Set<String>, Integer> entry : sortedclosures) {
            if (entry.getKey().size()>maxLength) maxLength=entry.getKey().size();
            System.out.println(entry.getKey());
        }
        FrequentItemsets.putAll(closures);
        System.out.println("maxLength=>"+maxLength);
        System.out.println("closures=>"+closures);
/*        for (Map.Entry<Set<String>, Integer> entry : closures.entrySet()) {
            Set<String> key = entry.getKey();
            Integer value = entry.getValue();
            System.out.println("Key: " + key + ", Value: " + value);
        }*/

        System.out.println("FrequentItemsets=>"+FrequentItemsets);
        Set<Set<String>> endFrequentItemsets = new HashSet<>(closures.keySet());
        Map<Set<String>, Integer> medium = new HashMap<>();
        while (maxLength >1) {
            Iterator<Set<String>> iterator = closures.keySet().iterator();
            while (iterator.hasNext()) {
                Set<String> key = iterator.next();
                if (key.size() == maxLength) {
                    List<Set<String>> subSetsk = generateKsubsets(key);
                    System.out.println(key+"======>SUB "+subSetsk);
                    for (Set<String> sub : subSetsk) {
                        if (!closures.containsKey(sub)) {
                            System.out.println(key + " newAdd => " + sub+"=>"+closures.get(key));
                            endFrequentItemsets.add(sub);
//                            FrequentItemsets.put(sub,0);
                            FrequentItemsets.put(sub,closures.get(key));
                            medium.put(sub,closures.get(key));
                            System.out.println("medium=>"+medium);

                        }
                    }
                }
            }
            for (Set<String> newClosure:medium.keySet()){
                closures.put(newClosure,medium.get(newClosure));
            }
            maxLength--;
        }
        System.out.println("This is end");
        List<Set<String>> list=new ArrayList<>(endFrequentItemsets);
        Collections.sort(list, new Comparator<Set<String>>() {
            @Override
            public int compare(Set<String> entry1, Set<String> entry2) {
                int length = Integer.compare(entry1.size(), entry2.size());
                if (length != 0) {
                    return length; // 先按长度排序
                } else {
                    return entry1.hashCode() - entry2.hashCode(); // 长度相同时按哈希值排序
                }
            }
        });
        System.out.println("END all=>"+endFrequentItemsets);
        System.out.println("END all=>"+FrequentItemsets);

        return FrequentItemsets;

        }

    public static Set<String> calculateClosure(Set<String> item, List<List<String>> transactions) {
        Set<String> currentClosure = null;
        // 遍历每个事务
        for (List<String> transaction : transactions) {
            // 如果事务包含项(item),则将事务中的所有项添加到当前闭包中
            if (transaction.containsAll(item)) {
                if (currentClosure == null) {
                    currentClosure = new HashSet<>(transaction);
                } else {
                    currentClosure.retainAll(transaction);
                }
            }
        }
        return currentClosure != null ? currentClosure : new HashSet<>();
    }

    // 生成K-候选项目集
    public static Map<Set<String>, Integer> generateFCC(Set<Set<String>> frequentItemsets, int k) {
        Map<Set<String>, Integer> candidateItemsets = new HashMap<>();

        for (Set<String> itemset1 : frequentItemsets) {
            for (Set<String> itemset2 : frequentItemsets) {
                if (k == 2) {
                    Set<String> connection = Connection(itemset1, itemset2);
                    if (connection.size() == k && hasInfrequentSubsets(connection, frequentItemsets, k - 1)) {
                        candidateItemsets.put(connection, 0);
                    }
                } else if (k!=1){
                    List<String> list1 = new ArrayList<>(itemset1);
                    List<String> list2 = new ArrayList<>(itemset2);
                    List<String> subList1 = list1.subList(0, k - 2);
                    List<String> subList2 = list2.subList(0, k - 2);

                    if (subList1.equals(subList2)) {
                        Set<String> connection = Connection(itemset1, itemset2);
                        if (connection.size() == k && hasInfrequentSubsets(connection, frequentItemsets, k - 1)) {
                            candidateItemsets.put(connection, 0);
                        }
                    }
                }
            }
        }
        return candidateItemsets;
    }

    private static Set<String> Connection(Set<String> itemset1, Set<String> itemset2) {
        Set<String> connection = new HashSet<>(itemset1);
        connection.addAll(itemset2);
        return connection;
    }

    private static boolean hasInfrequentSubsets(Set<String> connection, Set<Set<String>> frequentItemsets, int k) {
        List<String> list = new ArrayList<>(connection);
        for (int i = 0; i < list.size(); i++) {
            // 生成长度为 K-1 的子集
            List<String> subList = new ArrayList<>(list);
            subList.remove(i);
            Set<String> subsetSet = new HashSet<>(subList);
            if (!containsSubset(frequentItemsets, subsetSet)) {
                return false;
            }
        }
        return true;
    }

    //长度为(k-1)的子集是不是在Fi中,只要有一个包含就是在
    private static boolean containsSubset(Set<Set<String>> frequentItemsets, Set<String> subsetSet) {
        for (Set<String> itemset : frequentItemsets) {
            if (itemset.containsAll(subsetSet)) {
                return true;
            }
        }
        return false;
    }

    private static List<Set<String>> generateSubsets(Set<String> itemset) {
        List<Set<String>> subsets = new ArrayList<>();
        for (int i = 0; i < (1 << itemset.size()); i++) {
            Set<String> subset = new HashSet<>();
            int index = 0;
            for (String item : itemset) {
                if ((i & (1 << index)) >0) {
                    subset.add(item);
                }
                index++;
            }
            if (subset.size() > 0 && subset.size() <=itemset.size()) {
                subsets.add(subset);
            }
        }

        return subsets;
    }

    private static List<Set<String>> generateKsubsets(Set<String> itemset) {
        List<String> list = new ArrayList<>(itemset);
        List<Set<String>> set = new ArrayList<>();

        for (int i = 0; i < list.size(); i++) {
            // 生成长度为 K-1 的子集
            List<String> subList = new ArrayList<>(list);
            subList.remove(i);
            Set<String> subsetSet = new HashSet<>(subList);
            set.add(subsetSet);
        }
        return set;
    }

    public static Map<Set<String>, Integer> genrateMaxF(Map<Set<String>, Integer> FrequentItemsets){
        Map<Set<String>, Integer> maxFrequentItemsets=new HashMap<>();
        for (Map.Entry<Set<String>, Integer> entry : FrequentItemsets.entrySet()) {
            Set<String> itemset = entry.getKey();
            int support=entry.getValue();
            boolean isMax = true;

            // 检查itemset是否被其他频繁项目集包含
            for (Map.Entry<Set<String>, Integer> entry2 : FrequentItemsets.entrySet()) {
                Set<String> otherItemset = entry2.getKey();
                if (otherItemset.equals(itemset)) {
                    continue;
                }

                if (otherItemset.containsAll(itemset)) {
                    isMax = false;
                    break;
                }
            }
            // 将不被其他频繁项目集包含的itemset添加到最大频繁项目集中
            if (isMax) {
                maxFrequentItemsets.put(itemset,support);
            }
        }

        System.out.println("-------------------【不被其他频繁项目集包含的最大频繁项目集】-----------------");
        for (Map.Entry<Set<String>, Integer> entry : maxFrequentItemsets.entrySet()) {
            System.out.println(entry.getKey());
        }
        System.out.println("-----------------------------------------------------------------------------");



        return maxFrequentItemsets;
    }

    public static void generateAssociationRules(double minConfidence,Map<Set<String>, Integer> FrequentItemsets) {
        Map<Set<String>, Set<String>> rules = new HashMap<>();
        for (Set<String> itemset : FrequentItemsets.keySet()) {//所有键的 Set 集合
            if (itemset.size() > 1) {
                List<Set<String>> subsets = generateSubsets(itemset);

                System.out.println("-------------【遍历频繁项目集】-------------------------------");
                System.out.println(itemset);
                System.out.println(subsets);
                System.out.println("------------------------------------------------------------------");

                System.out.println("------------------------------------------------------------------");
                System.out.println(itemset);
                for (Set<String> subset : subsets) {
                    if (subset.equals(itemset)) {
                        continue;}
                    Set<String> remaining = new HashSet<>(itemset);
                    remaining.removeAll(subset);
                    int subsetSupportCount = FrequentItemsets.get(subset);   // 获取子集的支持度计数
                    double confidence = (double) FrequentItemsets.get(itemset) /subsetSupportCount;
                    if (confidence >= minConfidence) {
                        System.out.println(subset + " => " + remaining);
//                        rules.put(subset, remaining);
//                        System.out.println(rules);
                    }
                }
            }
        }

    }

}

dataset.txt

序号 商品
1 短裤、帽子、长裤、裙子、棉衣、短袖、衬衫、袜子
2 帽子、长裤、裙子、棉衣、短袖、衬衫、袜子
3 短裤、帽子、裙子、棉衣、短袖、衬衫、袜子
4 短裤、帽子、棉衣、短袖、衬衫、袜子
5 短裤、帽子、长裤、裙子、棉衣、短袖、袜子
6 短裤、帽子、长裤、裙子、棉衣、短袖、衬衫、袜子
7 短裤、帽子、长裤、裙子、棉衣、短袖、
8 帽子、长裤、裙子、棉衣、衬衫、袜子
9 短裤、帽子、长裤、裙子、棉衣、短袖、衬衫、袜子
10 短裤、帽子、长裤、裙子、短袖、衬衫、袜子
11 短裤、帽子、长裤、裙子、棉衣、短袖、衬衫、袜子
12 短裤、帽子、长裤、棉衣、短袖、衬衫、袜子
13 短裤、长裤、裙子、棉衣、短袖、衬衫、袜子
14 帽子、长裤、裙子、棉衣、短袖、衬衫、袜子
15 短裤、帽子、长裤、裙子、棉衣、短袖、衬衫、袜子

三、实验结果与分析

Table2 -所有频繁项目集

在这里插入图片描述在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值