利用字典树合并相同前缀(利用前缀树合并相同前缀) ,或者利用树结构递归合并前缀

看到项目里面有个工具,是把一些空间的相同前缀合并在一起,但是写得不好,自己写了两个

具体案例就是例如 A栋-1F-101,A栋-1F-102 , 变为 A栋-1F-(101、102) .

两种方式 , 构建树的方法其实是比较正规的,因为层级之间明确,不会出现截错字符串的情况 , 而且多层级之间也可以多层分类,看需求要不要了
字典树的方式比较泛用 , 但也有缺陷 , 也就是如果你没有树结构的数据 , 只有一个字符串列表的时候 , 但是具体怎么控制什么时候去开启分支 , 那就要自己看下if怎么写了 , 我暂时只是用简单的 sons.size() > 1 && index > 0 来判断 , 也就是除了首次 , 第一次出现分支就去合并 , 这样可能会合并出和 树结构递归不一样的结果 , 例如 A栋-1f-101,A栋-1f-102,A栋-2f .
此时应该是A栋-(1f-101,1f-102,2f) , 其他结果可能比较难实现或者难达到预期 .
树结构的话你是可以明确你当前的孩子是否是末级节点(也就是children是空的) , 但是字典树像是个黑盒 , 你准备遍历一个son.size()大于1的分支 , 但你压根不知道这些分支后面到底还有多少字符, 其中一个分支内是否还会另开分支 , 你必须遍历完了才知道整串字符串是如何 , 而且这个另开的分支是否符合你的意愿也是未知的 , A栋-1f-101,A栋-1f-1024 , 这个A栋-1f-10(1,24) , 就挺奇怪

方法一 : 利用树结构并递归 , 前提是这个确实是个可以构造成树结构的入参

因为里面有些 构建树的方法 , 输出树的方法和构造全名的方法 , 所以代码比较长 , mergeSameHead 是主要方法,

[a栋, a栋-1f, a栋-2f, b栋, c栋, c栋-1f, d栋, d栋-1f, d栋-1f-202间, d栋-2f, d栋-4f, d栋-3f, d栋-3f-808间, d栋-3f-909间]
mergeSameHead_inner的效果 , 这个是只有在遇到末级多分支的时候 , 也就是相同父级下的末级节点才会合并 , 其实挺清晰的,虽然合并力度不大 , 判断逻辑也简单 , 遍历一个列表的时候 , 只要存在两个及以上没有孩子的末级节点,就可以拼接起来了
[a栋(1f,2f), c栋-1f, d栋-1f-202间, d栋-3f(808间,909间), d栋(2f,4f), b栋]
mergeSameHead_inner_v2的效果 , 这个是多层级之间可以合并的 , 也就是只要遇到分支就开始合并 , 又遇到分支就继续合并 , 为了明确层级 , 用了辅助的depth字段 , 因为只有开启分支的时候才会实际上用到新括号 ,
if (treeList.size() == 1 || VIRTUAL_TOP.equals(parent)) 的else里面(实际就是list的size()大于等于2,开启新分支时)才会depth+1,这样可以用新括号.
[a栋(1f,2f), b栋, c栋-1f, d栋(1f-202间,2f,4f,3f<808间,909间>)]


@Builder
@AllArgsConstructor
@NoArgsConstructor
@Accessors(chain = true)
@Data
public class MergeSameHeadTree {

    static final MergeSameHeadTree VIRTUAL_TOP = MergeSameHeadTree.builder().fullName("").build();

    String id;

    String pid;

    String name;

    String fullName;

    List<MergeSameHeadTree> children;

    public static void main(String[] args) {

        List<MergeSameHeadTree> list = Arrays.asList(


//                // v1
//                MergeSameHeadTree.builder().id("d栋老爹").name("d栋老爹").build(),
//                MergeSameHeadTree.builder().id("d栋").name("d栋").pid("d栋老爹").build(),
//                // v1end

                // v2
                MergeSameHeadTree.builder().id("a栋").name("a栋").build(),
                MergeSameHeadTree.builder().id("a栋-1f").name("1f").pid("a栋").build(),
                MergeSameHeadTree.builder().id("a栋-2f").name("2f").pid("a栋").build(),
                MergeSameHeadTree.builder().id("b栋").name("b栋").build(),
                MergeSameHeadTree.builder().id("c栋").name("c栋").build(),
                MergeSameHeadTree.builder().id("c栋-1f").name("1f").pid("c栋").build(),
                MergeSameHeadTree.builder().id("d栋").name("d栋").build(),
                // v2end

                MergeSameHeadTree.builder().id("d栋-1f").name("1f").pid("d栋").build(),
                MergeSameHeadTree.builder().id("d栋-1f-202间").name("202间").pid("d栋-1f").build(),
                MergeSameHeadTree.builder().id("d栋-2f").name("2f").pid("d栋").build(),
                MergeSameHeadTree.builder().id("d栋-4f").name("4f").pid("d栋").build(),
                MergeSameHeadTree.builder().id("d栋-3f").name("3f").pid("d栋").build(),
                MergeSameHeadTree.builder().id("d栋-3f-808间").name("808间").pid("d栋-3f").build(),
                MergeSameHeadTree.builder().id("d栋-3f-909间").name("909间").pid("d栋-3f").build()
        );
        List<MergeSameHeadTree> treeList = buildTree(list, a -> StringUtils.isBlank(a.getPid()));
        buildFullName(treeList, "-");
        printTreeFuncSimple(MergeSameHeadTree::getName, treeList, 4);
        System.out.println(list.stream().map(MergeSameHeadTree::getFullName).collect(Collectors.toList()));
        List<String> mergeSameHead = mergeSameHead(treeList);
        System.out.println(mergeSameHead);

    }

    public static List<String> mergeSameHead(List<MergeSameHeadTree> treeList) {
        if (CollectionUtils.isEmpty(treeList)) {
            return new ArrayList<>();
        }
        List<String> res = new ArrayList<>();
        mergeSameHead_inner(treeList, VIRTUAL_TOP, res, 0);
        return res;
    }

    static String[][] kuohao = new String[][]{
            {"(", ")"},
            {"<", ">"},
            {"[", "]"},
            {"{", "}"}
    };

    private static void mergeSameHead_inner(List<MergeSameHeadTree> treeList, MergeSameHeadTree parent,
                                            List<String> mergeNameList, int depth) {
        if (CollectionUtils.isEmpty(treeList)) {
            mergeNameList.add(parent.getFullName());
            return;
        }
        if (treeList.size() == 1 || VIRTUAL_TOP.equals(parent)) {
            for (MergeSameHeadTree y : treeList) {
                mergeSameHead_inner(y.getChildren(), y, mergeNameList, depth);
            }
        } else {
            List<String> subRes = new ArrayList<>();
            for (MergeSameHeadTree y : treeList) {
                mergeSameHead_inner(y.getChildren(), y, subRes, depth + 1);
            }
            String[] bracket = kuohao[depth % kuohao.length];
            mergeNameList.add(subRes.stream().map(a -> a.substring(parent.getFullName().length() + 1))
                    .collect(Collectors.joining(",", parent.getFullName() + "-" + bracket[0], bracket[1])));
        }
    }

    public static List<MergeSameHeadTree> buildTree(List<MergeSameHeadTree> originList, Function<? super MergeSameHeadTree, Boolean> isRoot) {
        List<MergeSameHeadTree> treeList = new ArrayList<>();
        if (CollectionUtils.isEmpty(originList)) {
            return treeList;
        }
        Map<String, MergeSameHeadTree> originMap = new HashMap<>(originList.size());
        for (Iterator<MergeSameHeadTree> iterator = originList.iterator(); iterator.hasNext(); ) {
            MergeSameHeadTree cic = iterator.next();
            if (null == cic || StringUtils.isBlank(cic.getId())) {
                iterator.remove();
                continue;
            }
            if (isRoot.apply(cic)) {
                treeList.add(cic);
            }
            cic.setChildren(new ArrayList<>());
            originMap.put(cic.getId(), cic);
        }
        for (MergeSameHeadTree value : originList) {
            MergeSameHeadTree parent = originMap.get(value.getPid());
            if (null != parent) {
                parent.getChildren().add(value);
            }
        }
        return treeList;
    }

    public static void buildFullName(List<MergeSameHeadTree> treeList, String splitStr) {
        if (null == treeList || treeList.isEmpty()) {
            return;
        }
        for (MergeSameHeadTree s : treeList) {
            s.setFullName(s.getName());
            buildFullNameChild(s.getChildren(), s.getFullName(), splitStr);
        }
    }

    private static void buildFullNameChild(List<MergeSameHeadTree> treeList, String head, String splitStr) {
        if (null == treeList || treeList.isEmpty()) {
            return;
        }
        for (MergeSameHeadTree s : treeList) {
            s.setFullName(head + splitStr + s.getName());
            buildFullNameChild(s.getChildren(), s.getFullName(), splitStr);
        }
    }

    public static void printTreeFuncSimple(Function<? super MergeSameHeadTree, String> fun,
                                           List<MergeSameHeadTree> treeList, int tabSize) {
        printTreeFuncSimple(fun, treeList, tabSize, 0);
    }

    private static void printTreeFuncSimple(Function<? super MergeSameHeadTree, String> fun,
                                            List<MergeSameHeadTree> treeList, int tabSize, int level) {
        if (null == treeList || treeList.isEmpty()) {
            return;
        }
        for (MergeSameHeadTree s : treeList) {
            System.out.println(" ".repeat(level * tabSize) + "\\" + "_".repeat(3) + fun.apply(s));
            List<MergeSameHeadTree> children = s.getChildren();
            printTreeFuncSimple(fun, children, tabSize, level + 1);
        }
    }

}


方法二 : 字典树

看老代码的时候发现了这个方法再改了一下,下面的是新方法,再下面的是旧的


import java.util.*;
import java.util.function.Function;

public class CharDicByMapMergeSameHead {
    static char[] forPrintArr = new char[16];

    static CharDicByMapMergeSameHead charDic = new CharDicByMapMergeSameHead(Arrays.asList(
            ("A栋-1F-101,A栋-1F-102,A栋-1F-102-4号,A栋-1F-102-5号-1号货架,A栋-1F-102-5号-12号货架," +
                    "A栋-1F-1077,A栋-2F-202,A栋-3F-302,A栋-整栋,B栋-3F-310,B栋-3F-3144,B栋-4F-整层,C栋-3F,C栋-4F").split(",")));

    public static void main(String[] args) {
        System.out.println(" 输出字典树内容 ");
        charDic.printTree();

        合并相同前缀();

    }


    private static void 合并相同前缀() {
        List<String> list = Arrays.asList(("A栋-1F-101,A栋-1F-102,A栋-1F-102-4号,A栋-1F-102-5号-1号货架,A栋-1F-102-5号-12号货架," +
                "A栋-1F-1077,A栋-2F-202,A栋-3F-302,A栋-整栋,B栋-3F-310,B栋-3F-3144,B栋-4F-整层,C栋-3F,C栋-4F").split(","));
        CharDicByMapMergeSameHead charDicByMap = new CharDicByMapMergeSameHead(list);
        List<String> res = charDicByMap.getMergeHeadList();
        res.forEach(System.out::println);
    }

    private void printTree() {
        HashMap<Character, Node> sons = root.sons;
        printTree(sons, 0, f -> f.isEnd ? "(LEAF)" : "");
    }

    private void printTree(HashMap<Character, Node> sons, int depth, Function<? super Node, String> funcTail) {
        if (sons.isEmpty()) {
            return;
        }
        for (Map.Entry<Character, Node> entry : sons.entrySet()) {
            forPrintArr[depth] = entry.getKey();
            System.out.println(" ".repeat(depth * 4 + 4) + "\\" + "_".repeat(depth) + entry.getKey() + "[" + new String(forPrintArr, 0, depth + 1) + "]" + funcTail.apply(entry.getValue()));
            printTree(entry.getValue().sons, depth + 1, funcTail);
        }
    }

    public List<String> getMergeHeadList() {
        HashMap<Character, Node> sons = root.sons;
        List<String> res = new ArrayList<>();
        char[] resTemp = new char[maxLen];
        getMergeHeadList_inner(res, root, sons, resTemp, 0, 0, 0);
        for (int j = 0; j < res.size(); j++) {
            String s = res.get(j);
            int len = 0;
            for (int i = 0; i < s.length(); i++) {
                char c = s.charAt(i);
                if (c == '(' || c == '[' || c == '{') {
                    len++;
                } else {
                    break;
                }
            }
            if (len > 0) {
                System.out.println(" [括号多了] 括号对数 : " + len);
                res.set(j, s.substring(len, s.length() - len));
            }
        }
        return res;
    }

    static String[][] bracket = new String[][]{{"{", "}"}, {"(", ")"}, {"[", "]"}, {"<", ">"}};

    private void getMergeHeadList_inner(List<String> res, Node root, HashMap<Character, Node> sons, char[] resTemp, int start, int end, int bracketDepth) {
        if (null == sons || sons.isEmpty()) {
            if (root.isEnd) {
                res.add(new String(resTemp, start, end - start));
            }
            return;
        }
        if (sons.size() > 1) {
            String resHead = new String(resTemp, start, end - start);
            List<String> branchAll = new ArrayList<>();
            for (Map.Entry<Character, Node> entry : sons.entrySet()) {
                resTemp[end] = entry.getKey();
                List<String> branch = new ArrayList<>();
                getMergeHeadList_inner(branch, entry.getValue(), entry.getValue().sons, resTemp, end, end + 1, bracketDepth + 1);
                branchAll.addAll(branch);
            }
            int len = 0;
            for (; end > 0 && resTemp[end - 1] >= '0' && resTemp[end - 1] <= '9'; end--, len++) ;
            if (len > 0) {
                System.out.println(" 把末尾的数字后移 ");
                String tail2Head = resHead.substring(resHead.length() - len);
                resHead = resHead.substring(0, resHead.length() - len);
                branchAll.replaceAll(s -> tail2Head + s);
            }
            if (bracketDepth == 0) {
                res.addAll(branchAll);
            } else {
                String[] curBracket = bracket[bracketDepth % bracket.length];
                StringJoiner sjChild = new StringJoiner("、", curBracket[0], curBracket[1]);
                branchAll.forEach(sjChild::add);
                res.add(resHead + sjChild);
            }
        } else {
            Map.Entry<Character, Node> entry = root.sons.entrySet().iterator().next();
            resTemp[end] = entry.getKey();
            Node next = entry.getValue();
            getMergeHeadList_inner(res, next, next.sons, resTemp, start, end + 1, bracketDepth);
        }
    }

    public static class Node {
        public HashMap<Character, Node> sons;

        public boolean isEnd;
        public int length;

        public Node() {
            sons = new HashMap<>();
            isEnd = false;
        }

    }

    public CharDicByMapMergeSameHead(Collection<String> list) {
        root = new Node();
        generateNodeByStringList(list);
    }

    public Node root;

    public int avgLen;
    public int mostLen;
    public int maxLen;

    int mostCount;
    int distinctCount;

    public void generateNodeByStringList(Collection<String> list) {
        if (list == null || list.isEmpty()) {
            return;
        }
        Map<Integer, Integer> map = new HashMap<>();
        long totalLen = 0;
        for (String f : list) {
            int length = f.length();
            totalLen += length;
            map.put(length, map.getOrDefault(length, 0) + 1);
            addSingle(f);
        }
        avgLen = (int) (totalLen / list.size()) + 1;
        for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
            if (entry.getValue() > mostCount) {
                mostCount = entry.getValue();
                mostLen = entry.getKey();
            }
        }
    }

    public void addSingle(String f) {
        int length = f.length();
        maxLen = Math.max(maxLen, length);
        while (forPrintArr.length < maxLen) {
            forPrintArr = Arrays.copyOf(forPrintArr, forPrintArr.length << 1);
        }
        Node ro = root;
        // 正向构造字典树
        for (int i = 0; i < length; i++) {
            char c = f.charAt(i);
            // 纯大写或者小写字母用 c - 'a' 或者 c - 'A' , 否则直接用本身的 ascii码
            if (ro.sons.get(c) == null) {
                // root.isEnd=false;
                ro.sons.put(c, new Node());
            }
            ro = ro.sons.get(c);
        }
        if (!ro.isEnd) {
            distinctCount++;
        }
        ro.isEnd = true;
        ro.length = length;
    }

}


以下是旧版

方法二 - 字典树 - old

public class 用字典树合并相同头 {

    static Map<Integer, String> starMap = new HashMap<>();

    public static void main(String[] args) {

        获取共同前缀的结果();

    }

    private static void 获取共同前缀的结果() {
        List<String> list = Arrays.asList(("A栋-1F-101,A栋-1F-102,A栋-2F-202,A栋-3F-302,A栋-整栋,B栋-3F-310,B栋-3F-311,B栋-4F-整层," +
                "C栋-3F,C栋-4F,D栋-7F,E栋-4444F,E栋-666666F,hhhhhh栋-3F,hhhhhh栋-4F").split(","));
        用字典树合并相同头 charDicWithMap = new 用字典树合并相同头(list);
        List<String> mergeHeadList = charDicWithMap.getMergeHeadList();
        System.out.println(mergeHeadList);
    }

    public List<String> getMergeHeadList() {
        HashMap<Character, Node> sons = root.sons;
        List<String> res = new ArrayList<>();
        char[] resTemp = new char[maxLen];
        getMergeHeadList_inner(res, sons, resTemp, 0, false);
        //         getMergeHeadList_inner_v2(res, sons, resTemp, 0, 0);
        return res;
    }

    private void getMergeHeadList_inner(List<String> res, HashMap<Character, Node> sons, char[] resTemp, int index, boolean isBranch) {
        if (!isBranch) {
            String resHead = new String(resTemp, 0, index);
            // sons.size() >= 2 &&
//            if (index > 0 && (resTemp[index - 1] == '-' || index >= 5)) {
            if (sons.size() > 1 && index > 0) {
                List<String> branchAll = new ArrayList<>();
                for (Map.Entry<Character, Node> entry : sons.entrySet()) {
                    char[] charsChild = new char[resTemp.length - index];
                    charsChild[0] = entry.getKey();
                    List<String> branch = new ArrayList<>();
                    getMergeHeadList_inner(branch, entry.getValue().sons, charsChild, 1, true);
                    branchAll.addAll(branch);
                }
                if (branchAll.size() > 1) {
                    res.add(branchAll.stream().collect(Collectors.joining("、", resHead + "(", ")")));
                } else if (branchAll.size() == 1) {
                    res.add(resHead + branchAll.get(0));
                }
                return;
            }
        }
        for (Map.Entry<Character, Node> entry : sons.entrySet()) {
            resTemp[index] = entry.getKey();
            if (entry.getValue().isEnd) {
                res.add(new String(resTemp, 0, index + 1));
            } else {
                getMergeHeadList_inner(res, entry.getValue().sons, resTemp, index + 1, isBranch);
            }
        }
    }
    static String[][] kuohao = new String[][]{
            {"(", ")"},
            {"<", ">"},
            {"[", "]"},
            {"{", "}"}
    };

    private void getMergeHeadList_inner_v2(List<String> res, HashMap<Character, Node> sons, char[] resTemp, int index, int depth) {
        String resHead = new String(resTemp, 0, index);
          // 如果不用分隔符作为合并标记 , 那么会出现之前说的 A栋-1f-101,A栋-1f-1024 , 这个A栋-1f-10(1,24) , 就挺奇怪
        if (index > 0 && resTemp[index - 1] == '-') {
          //   System.out.println(" 检测到 新分支 , 当前的字符串是 : " + new String(resTemp, 0, index));
            List<String> branchAll = new ArrayList<>();
            for (Map.Entry<Character, Node> entry : sons.entrySet()) {
                char[] charsChild = new char[resTemp.length - index];
                charsChild[0] = entry.getKey();
                List<String> branch = new ArrayList<>();
                getMergeHeadList_inner_v2(branch, entry.getValue().sons, charsChild, 1, depth + 1);
                branchAll.addAll(branch);
            }
            if (branchAll.size() > 1) {
                res.add(branchAll.stream().collect(Collectors.joining("、", resHead.substring(0, resHead.length() - 1) + kuohao[depth % kuohao.length][0], kuohao[depth % kuohao.length][1])));
            } else if (branchAll.size() == 1) {
                res.add(resHead + branchAll.get(0));
            }
            return;
        }
        // System.out.println(" [走非分支代码]当前的字符串是 : " + new String(resTemp, 0, index));
        for (Map.Entry<Character, Node> entry : sons.entrySet()) {
            resTemp[index] = entry.getKey();
            if (entry.getValue().isEnd) {
                res.add(new String(resTemp, 0, index + 1));
            } else {
                getMergeHeadList_inner_v2(res, entry.getValue().sons, resTemp, index + 1, depth);
            }
        }
    }

    public static class Node {
        public HashMap<Character, Node> sons;
        public boolean isEnd;
        public int length;

        public Node() {
            sons = new HashMap<>();
            isEnd = false;
        }

    }

    public 用字典树合并相同头() {
        root = new Node();
    }

    public 用字典树合并相同头(Collection<String> list) {
        root = new Node();
        generateNodeByStringList(list);
    }

    public void generateNodeByStringList(Collection<String> list) {
        Map<Integer, Integer> map = new HashMap<>();
        long totalLen = 0;
        for (String f : list) {
            int length = f.length();
            totalLen += length;
            maxLen = Math.max(maxLen, length);
            map.put(length, map.getOrDefault(length, 0) + 1);
            if (!starMap.containsKey(length)) {
                starMap.put(length, "*".repeat(length).intern());
            }
            用字典树合并相同头.Node ro = root;
            for (int i = 0; i < length; i++) {
                char c = f.charAt(i);
                if (ro.sons.get(c) == null) {
                    ro.sons.put(c, new 用字典树合并相同头.Node());
                }
                ro = ro.sons.get(c);
            }
            if (!ro.isEnd) {
                distinctCount++;
            }
            ro.isEnd = true;
            ro.length = length;
        }
        avgLen = (int) (totalLen / list.size()) + 1;
        for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
            if (entry.getValue() > mostCount) {
                mostCount = entry.getValue();
                mostLen = entry.getKey();
            }
        }
        System.out.println(" 字典树的元素个数 : " + distinctCount + " , 长度的众数 : " + mostLen + " , 最长元素的长度 : " + maxLen + " , 平均长度 : " + avgLen);

    }

    public Node root;

    /**
     * 平均长度
     */
    public int avgLen;
    /**
     * 长度的众数
     */
    public int mostLen;
    /**
     * 马克斯·莱恩
     */
    public int maxLen;

    int mostCount;
    int distinctCount;

}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值