看到项目里面有个工具,是把一些空间的相同前缀合并在一起,但是写得不好,自己写了两个
具体案例就是例如 A栋-1F-101,A栋-1F-102 , 变为 A栋-1F-(101、102) .
两种方式 , 构建树的方法其实是比较正规的,因为层级之间明确,不会出现截错字符串的情况 , 而且多层级之间也可以多层分类,看需求要不要了
字典树的方式比较泛用 , 但也有缺陷 , 也就是如果你没有树结构的数据 , 只有一个字符串列表的时候 , 但是具体怎么控制什么时候去开启分支 , 那就要自己看下if怎么写了 , 我暂时只是用简单的 sons.size() > 1 && index > 0 来判断 , 也就是除了首次 , 第一次出现分支就去合并 , 这样可能会合并出和 树结构递归不一样的结果 , 例如 A栋-1f-101,A栋-1f-102,A栋-2f .
此时应该是A栋-(1f-101,1f-102,2f) , 其他结果可能比较难实现或者难达到预期 .
树结构的话你是可以明确你当前的孩子是否是末级节点(也就是children是空的) , 但是字典树像是个黑盒 , 你准备遍历一个son.size()大于1的分支 , 但你压根不知道这些分支后面到底还有多少字符, 其中一个分支内是否还会另开分支 , 你必须遍历完了才知道整串字符串是如何 , 而且这个另开的分支是否符合你的意愿也是未知的 , A栋-1f-101,A栋-1f-1024 , 这个A栋-1f-10(1,24) , 就挺奇怪
方法一 : 利用树结构并递归 , 前提是这个确实是个可以构造成树结构的入参
因为里面有些 构建树的方法 , 输出树的方法和构造全名的方法 , 所以代码比较长 , mergeSameHead 是主要方法,
[a栋, a栋-1f, a栋-2f, b栋, c栋, c栋-1f, d栋, d栋-1f, d栋-1f-202间, d栋-2f, d栋-4f, d栋-3f, d栋-3f-808间, d栋-3f-909间]
mergeSameHead_inner的效果 , 这个是只有在遇到末级多分支的时候 , 也就是相同父级下的末级节点才会合并 , 其实挺清晰的,虽然合并力度不大 , 判断逻辑也简单 , 遍历一个列表的时候 , 只要存在两个及以上没有孩子的末级节点,就可以拼接起来了
[a栋(1f,2f), c栋-1f, d栋-1f-202间, d栋-3f(808间,909间), d栋(2f,4f), b栋]
mergeSameHead_inner_v2的效果 , 这个是多层级之间可以合并的 , 也就是只要遇到分支就开始合并 , 又遇到分支就继续合并 , 为了明确层级 , 用了辅助的depth字段 , 因为只有开启分支的时候才会实际上用到新括号 ,
if (treeList.size() == 1 || VIRTUAL_TOP.equals(parent)) 的else里面(实际就是list的size()大于等于2,开启新分支时)才会depth+1,这样可以用新括号.
[a栋(1f,2f), b栋, c栋-1f, d栋(1f-202间,2f,4f,3f<808间,909间>)]
@Builder
@AllArgsConstructor
@NoArgsConstructor
@Accessors(chain = true)
@Data
public class MergeSameHeadTree {
static final MergeSameHeadTree VIRTUAL_TOP = MergeSameHeadTree.builder().fullName("").build();
String id;
String pid;
String name;
String fullName;
List<MergeSameHeadTree> children;
public static void main(String[] args) {
List<MergeSameHeadTree> list = Arrays.asList(
// // v1
// MergeSameHeadTree.builder().id("d栋老爹").name("d栋老爹").build(),
// MergeSameHeadTree.builder().id("d栋").name("d栋").pid("d栋老爹").build(),
// // v1end
// v2
MergeSameHeadTree.builder().id("a栋").name("a栋").build(),
MergeSameHeadTree.builder().id("a栋-1f").name("1f").pid("a栋").build(),
MergeSameHeadTree.builder().id("a栋-2f").name("2f").pid("a栋").build(),
MergeSameHeadTree.builder().id("b栋").name("b栋").build(),
MergeSameHeadTree.builder().id("c栋").name("c栋").build(),
MergeSameHeadTree.builder().id("c栋-1f").name("1f").pid("c栋").build(),
MergeSameHeadTree.builder().id("d栋").name("d栋").build(),
// v2end
MergeSameHeadTree.builder().id("d栋-1f").name("1f").pid("d栋").build(),
MergeSameHeadTree.builder().id("d栋-1f-202间").name("202间").pid("d栋-1f").build(),
MergeSameHeadTree.builder().id("d栋-2f").name("2f").pid("d栋").build(),
MergeSameHeadTree.builder().id("d栋-4f").name("4f").pid("d栋").build(),
MergeSameHeadTree.builder().id("d栋-3f").name("3f").pid("d栋").build(),
MergeSameHeadTree.builder().id("d栋-3f-808间").name("808间").pid("d栋-3f").build(),
MergeSameHeadTree.builder().id("d栋-3f-909间").name("909间").pid("d栋-3f").build()
);
List<MergeSameHeadTree> treeList = buildTree(list, a -> StringUtils.isBlank(a.getPid()));
buildFullName(treeList, "-");
printTreeFuncSimple(MergeSameHeadTree::getName, treeList, 4);
System.out.println(list.stream().map(MergeSameHeadTree::getFullName).collect(Collectors.toList()));
List<String> mergeSameHead = mergeSameHead(treeList);
System.out.println(mergeSameHead);
}
public static List<String> mergeSameHead(List<MergeSameHeadTree> treeList) {
if (CollectionUtils.isEmpty(treeList)) {
return new ArrayList<>();
}
List<String> res = new ArrayList<>();
mergeSameHead_inner(treeList, VIRTUAL_TOP, res, 0);
return res;
}
static String[][] kuohao = new String[][]{
{"(", ")"},
{"<", ">"},
{"[", "]"},
{"{", "}"}
};
private static void mergeSameHead_inner(List<MergeSameHeadTree> treeList, MergeSameHeadTree parent,
List<String> mergeNameList, int depth) {
if (CollectionUtils.isEmpty(treeList)) {
mergeNameList.add(parent.getFullName());
return;
}
if (treeList.size() == 1 || VIRTUAL_TOP.equals(parent)) {
for (MergeSameHeadTree y : treeList) {
mergeSameHead_inner(y.getChildren(), y, mergeNameList, depth);
}
} else {
List<String> subRes = new ArrayList<>();
for (MergeSameHeadTree y : treeList) {
mergeSameHead_inner(y.getChildren(), y, subRes, depth + 1);
}
String[] bracket = kuohao[depth % kuohao.length];
mergeNameList.add(subRes.stream().map(a -> a.substring(parent.getFullName().length() + 1))
.collect(Collectors.joining(",", parent.getFullName() + "-" + bracket[0], bracket[1])));
}
}
public static List<MergeSameHeadTree> buildTree(List<MergeSameHeadTree> originList, Function<? super MergeSameHeadTree, Boolean> isRoot) {
List<MergeSameHeadTree> treeList = new ArrayList<>();
if (CollectionUtils.isEmpty(originList)) {
return treeList;
}
Map<String, MergeSameHeadTree> originMap = new HashMap<>(originList.size());
for (Iterator<MergeSameHeadTree> iterator = originList.iterator(); iterator.hasNext(); ) {
MergeSameHeadTree cic = iterator.next();
if (null == cic || StringUtils.isBlank(cic.getId())) {
iterator.remove();
continue;
}
if (isRoot.apply(cic)) {
treeList.add(cic);
}
cic.setChildren(new ArrayList<>());
originMap.put(cic.getId(), cic);
}
for (MergeSameHeadTree value : originList) {
MergeSameHeadTree parent = originMap.get(value.getPid());
if (null != parent) {
parent.getChildren().add(value);
}
}
return treeList;
}
public static void buildFullName(List<MergeSameHeadTree> treeList, String splitStr) {
if (null == treeList || treeList.isEmpty()) {
return;
}
for (MergeSameHeadTree s : treeList) {
s.setFullName(s.getName());
buildFullNameChild(s.getChildren(), s.getFullName(), splitStr);
}
}
private static void buildFullNameChild(List<MergeSameHeadTree> treeList, String head, String splitStr) {
if (null == treeList || treeList.isEmpty()) {
return;
}
for (MergeSameHeadTree s : treeList) {
s.setFullName(head + splitStr + s.getName());
buildFullNameChild(s.getChildren(), s.getFullName(), splitStr);
}
}
public static void printTreeFuncSimple(Function<? super MergeSameHeadTree, String> fun,
List<MergeSameHeadTree> treeList, int tabSize) {
printTreeFuncSimple(fun, treeList, tabSize, 0);
}
private static void printTreeFuncSimple(Function<? super MergeSameHeadTree, String> fun,
List<MergeSameHeadTree> treeList, int tabSize, int level) {
if (null == treeList || treeList.isEmpty()) {
return;
}
for (MergeSameHeadTree s : treeList) {
System.out.println(" ".repeat(level * tabSize) + "\\" + "_".repeat(3) + fun.apply(s));
List<MergeSameHeadTree> children = s.getChildren();
printTreeFuncSimple(fun, children, tabSize, level + 1);
}
}
}
方法二 : 字典树
看老代码的时候发现了这个方法再改了一下,下面的是新方法,再下面的是旧的
import java.util.*;
import java.util.function.Function;
public class CharDicByMapMergeSameHead {
static char[] forPrintArr = new char[16];
static CharDicByMapMergeSameHead charDic = new CharDicByMapMergeSameHead(Arrays.asList(
("A栋-1F-101,A栋-1F-102,A栋-1F-102-4号,A栋-1F-102-5号-1号货架,A栋-1F-102-5号-12号货架," +
"A栋-1F-1077,A栋-2F-202,A栋-3F-302,A栋-整栋,B栋-3F-310,B栋-3F-3144,B栋-4F-整层,C栋-3F,C栋-4F").split(",")));
public static void main(String[] args) {
System.out.println(" 输出字典树内容 ");
charDic.printTree();
合并相同前缀();
}
private static void 合并相同前缀() {
List<String> list = Arrays.asList(("A栋-1F-101,A栋-1F-102,A栋-1F-102-4号,A栋-1F-102-5号-1号货架,A栋-1F-102-5号-12号货架," +
"A栋-1F-1077,A栋-2F-202,A栋-3F-302,A栋-整栋,B栋-3F-310,B栋-3F-3144,B栋-4F-整层,C栋-3F,C栋-4F").split(","));
CharDicByMapMergeSameHead charDicByMap = new CharDicByMapMergeSameHead(list);
List<String> res = charDicByMap.getMergeHeadList();
res.forEach(System.out::println);
}
private void printTree() {
HashMap<Character, Node> sons = root.sons;
printTree(sons, 0, f -> f.isEnd ? "(LEAF)" : "");
}
private void printTree(HashMap<Character, Node> sons, int depth, Function<? super Node, String> funcTail) {
if (sons.isEmpty()) {
return;
}
for (Map.Entry<Character, Node> entry : sons.entrySet()) {
forPrintArr[depth] = entry.getKey();
System.out.println(" ".repeat(depth * 4 + 4) + "\\" + "_".repeat(depth) + entry.getKey() + "[" + new String(forPrintArr, 0, depth + 1) + "]" + funcTail.apply(entry.getValue()));
printTree(entry.getValue().sons, depth + 1, funcTail);
}
}
public List<String> getMergeHeadList() {
HashMap<Character, Node> sons = root.sons;
List<String> res = new ArrayList<>();
char[] resTemp = new char[maxLen];
getMergeHeadList_inner(res, root, sons, resTemp, 0, 0, 0);
for (int j = 0; j < res.size(); j++) {
String s = res.get(j);
int len = 0;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == '(' || c == '[' || c == '{') {
len++;
} else {
break;
}
}
if (len > 0) {
System.out.println(" [括号多了] 括号对数 : " + len);
res.set(j, s.substring(len, s.length() - len));
}
}
return res;
}
static String[][] bracket = new String[][]{{"{", "}"}, {"(", ")"}, {"[", "]"}, {"<", ">"}};
private void getMergeHeadList_inner(List<String> res, Node root, HashMap<Character, Node> sons, char[] resTemp, int start, int end, int bracketDepth) {
if (null == sons || sons.isEmpty()) {
if (root.isEnd) {
res.add(new String(resTemp, start, end - start));
}
return;
}
if (sons.size() > 1) {
String resHead = new String(resTemp, start, end - start);
List<String> branchAll = new ArrayList<>();
for (Map.Entry<Character, Node> entry : sons.entrySet()) {
resTemp[end] = entry.getKey();
List<String> branch = new ArrayList<>();
getMergeHeadList_inner(branch, entry.getValue(), entry.getValue().sons, resTemp, end, end + 1, bracketDepth + 1);
branchAll.addAll(branch);
}
int len = 0;
for (; end > 0 && resTemp[end - 1] >= '0' && resTemp[end - 1] <= '9'; end--, len++) ;
if (len > 0) {
System.out.println(" 把末尾的数字后移 ");
String tail2Head = resHead.substring(resHead.length() - len);
resHead = resHead.substring(0, resHead.length() - len);
branchAll.replaceAll(s -> tail2Head + s);
}
if (bracketDepth == 0) {
res.addAll(branchAll);
} else {
String[] curBracket = bracket[bracketDepth % bracket.length];
StringJoiner sjChild = new StringJoiner("、", curBracket[0], curBracket[1]);
branchAll.forEach(sjChild::add);
res.add(resHead + sjChild);
}
} else {
Map.Entry<Character, Node> entry = root.sons.entrySet().iterator().next();
resTemp[end] = entry.getKey();
Node next = entry.getValue();
getMergeHeadList_inner(res, next, next.sons, resTemp, start, end + 1, bracketDepth);
}
}
public static class Node {
public HashMap<Character, Node> sons;
public boolean isEnd;
public int length;
public Node() {
sons = new HashMap<>();
isEnd = false;
}
}
public CharDicByMapMergeSameHead(Collection<String> list) {
root = new Node();
generateNodeByStringList(list);
}
public Node root;
public int avgLen;
public int mostLen;
public int maxLen;
int mostCount;
int distinctCount;
public void generateNodeByStringList(Collection<String> list) {
if (list == null || list.isEmpty()) {
return;
}
Map<Integer, Integer> map = new HashMap<>();
long totalLen = 0;
for (String f : list) {
int length = f.length();
totalLen += length;
map.put(length, map.getOrDefault(length, 0) + 1);
addSingle(f);
}
avgLen = (int) (totalLen / list.size()) + 1;
for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
if (entry.getValue() > mostCount) {
mostCount = entry.getValue();
mostLen = entry.getKey();
}
}
}
public void addSingle(String f) {
int length = f.length();
maxLen = Math.max(maxLen, length);
while (forPrintArr.length < maxLen) {
forPrintArr = Arrays.copyOf(forPrintArr, forPrintArr.length << 1);
}
Node ro = root;
// 正向构造字典树
for (int i = 0; i < length; i++) {
char c = f.charAt(i);
// 纯大写或者小写字母用 c - 'a' 或者 c - 'A' , 否则直接用本身的 ascii码
if (ro.sons.get(c) == null) {
// root.isEnd=false;
ro.sons.put(c, new Node());
}
ro = ro.sons.get(c);
}
if (!ro.isEnd) {
distinctCount++;
}
ro.isEnd = true;
ro.length = length;
}
}
以下是旧版
方法二 - 字典树 - old
public class 用字典树合并相同头 {
static Map<Integer, String> starMap = new HashMap<>();
public static void main(String[] args) {
获取共同前缀的结果();
}
private static void 获取共同前缀的结果() {
List<String> list = Arrays.asList(("A栋-1F-101,A栋-1F-102,A栋-2F-202,A栋-3F-302,A栋-整栋,B栋-3F-310,B栋-3F-311,B栋-4F-整层," +
"C栋-3F,C栋-4F,D栋-7F,E栋-4444F,E栋-666666F,hhhhhh栋-3F,hhhhhh栋-4F").split(","));
用字典树合并相同头 charDicWithMap = new 用字典树合并相同头(list);
List<String> mergeHeadList = charDicWithMap.getMergeHeadList();
System.out.println(mergeHeadList);
}
public List<String> getMergeHeadList() {
HashMap<Character, Node> sons = root.sons;
List<String> res = new ArrayList<>();
char[] resTemp = new char[maxLen];
getMergeHeadList_inner(res, sons, resTemp, 0, false);
// getMergeHeadList_inner_v2(res, sons, resTemp, 0, 0);
return res;
}
private void getMergeHeadList_inner(List<String> res, HashMap<Character, Node> sons, char[] resTemp, int index, boolean isBranch) {
if (!isBranch) {
String resHead = new String(resTemp, 0, index);
// sons.size() >= 2 &&
// if (index > 0 && (resTemp[index - 1] == '-' || index >= 5)) {
if (sons.size() > 1 && index > 0) {
List<String> branchAll = new ArrayList<>();
for (Map.Entry<Character, Node> entry : sons.entrySet()) {
char[] charsChild = new char[resTemp.length - index];
charsChild[0] = entry.getKey();
List<String> branch = new ArrayList<>();
getMergeHeadList_inner(branch, entry.getValue().sons, charsChild, 1, true);
branchAll.addAll(branch);
}
if (branchAll.size() > 1) {
res.add(branchAll.stream().collect(Collectors.joining("、", resHead + "(", ")")));
} else if (branchAll.size() == 1) {
res.add(resHead + branchAll.get(0));
}
return;
}
}
for (Map.Entry<Character, Node> entry : sons.entrySet()) {
resTemp[index] = entry.getKey();
if (entry.getValue().isEnd) {
res.add(new String(resTemp, 0, index + 1));
} else {
getMergeHeadList_inner(res, entry.getValue().sons, resTemp, index + 1, isBranch);
}
}
}
static String[][] kuohao = new String[][]{
{"(", ")"},
{"<", ">"},
{"[", "]"},
{"{", "}"}
};
private void getMergeHeadList_inner_v2(List<String> res, HashMap<Character, Node> sons, char[] resTemp, int index, int depth) {
String resHead = new String(resTemp, 0, index);
// 如果不用分隔符作为合并标记 , 那么会出现之前说的 A栋-1f-101,A栋-1f-1024 , 这个A栋-1f-10(1,24) , 就挺奇怪
if (index > 0 && resTemp[index - 1] == '-') {
// System.out.println(" 检测到 新分支 , 当前的字符串是 : " + new String(resTemp, 0, index));
List<String> branchAll = new ArrayList<>();
for (Map.Entry<Character, Node> entry : sons.entrySet()) {
char[] charsChild = new char[resTemp.length - index];
charsChild[0] = entry.getKey();
List<String> branch = new ArrayList<>();
getMergeHeadList_inner_v2(branch, entry.getValue().sons, charsChild, 1, depth + 1);
branchAll.addAll(branch);
}
if (branchAll.size() > 1) {
res.add(branchAll.stream().collect(Collectors.joining("、", resHead.substring(0, resHead.length() - 1) + kuohao[depth % kuohao.length][0], kuohao[depth % kuohao.length][1])));
} else if (branchAll.size() == 1) {
res.add(resHead + branchAll.get(0));
}
return;
}
// System.out.println(" [走非分支代码]当前的字符串是 : " + new String(resTemp, 0, index));
for (Map.Entry<Character, Node> entry : sons.entrySet()) {
resTemp[index] = entry.getKey();
if (entry.getValue().isEnd) {
res.add(new String(resTemp, 0, index + 1));
} else {
getMergeHeadList_inner_v2(res, entry.getValue().sons, resTemp, index + 1, depth);
}
}
}
public static class Node {
public HashMap<Character, Node> sons;
public boolean isEnd;
public int length;
public Node() {
sons = new HashMap<>();
isEnd = false;
}
}
public 用字典树合并相同头() {
root = new Node();
}
public 用字典树合并相同头(Collection<String> list) {
root = new Node();
generateNodeByStringList(list);
}
public void generateNodeByStringList(Collection<String> list) {
Map<Integer, Integer> map = new HashMap<>();
long totalLen = 0;
for (String f : list) {
int length = f.length();
totalLen += length;
maxLen = Math.max(maxLen, length);
map.put(length, map.getOrDefault(length, 0) + 1);
if (!starMap.containsKey(length)) {
starMap.put(length, "*".repeat(length).intern());
}
用字典树合并相同头.Node ro = root;
for (int i = 0; i < length; i++) {
char c = f.charAt(i);
if (ro.sons.get(c) == null) {
ro.sons.put(c, new 用字典树合并相同头.Node());
}
ro = ro.sons.get(c);
}
if (!ro.isEnd) {
distinctCount++;
}
ro.isEnd = true;
ro.length = length;
}
avgLen = (int) (totalLen / list.size()) + 1;
for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
if (entry.getValue() > mostCount) {
mostCount = entry.getValue();
mostLen = entry.getKey();
}
}
System.out.println(" 字典树的元素个数 : " + distinctCount + " , 长度的众数 : " + mostLen + " , 最长元素的长度 : " + maxLen + " , 平均长度 : " + avgLen);
}
public Node root;
/**
* 平均长度
*/
public int avgLen;
/**
* 长度的众数
*/
public int mostLen;
/**
* 马克斯·莱恩
*/
public int maxLen;
int mostCount;
int distinctCount;
}