算法拾遗十四前缀树概念以及不基于比较的排序

最新推荐文章于 2024-04-25 01:14:51 发布

原创最新推荐文章于 2024-04-25 01:14:51 发布 · 235 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#算法 #哈希算法 #java

算法块专栏收录该内容

73 篇文章

订阅专栏

本文介绍了前缀树（Trie）的概念及其操作，包括插入、搜索、删除和获取前缀字符串数量，并对比了前缀树与哈希表在字符串处理上的优势。此外，详细阐述了计数排序和基数排序这两种非比较排序算法，分析了它们的时间复杂度和稳定性，并探讨了排序算法在工程实践中的优化策略。

前缀树概念以及不基于比较的排序

前缀树概念

1）单个字符串中，字符从前到后的加到一棵多叉树上
2）字符放在路上，节点上有专属的数据项（常见的是pass和end值）
3）所有样本都这样添加，如果没有路就新建，如有路就复用
4）沿途节点的pass值增加1，每个字符串结束时来到的节点end值增加1可以完成前缀相关的查询

相比于hash表来说，对于每个字符串它要基于字符串的每个字符都计算出一个hash值，如果有一百万个字符串，每个字符串的平均长度为100则时间复杂度为O(100)，且hash表不支持查关于某个前缀的所有字符串
，而前缀树可以。
前缀树的复杂度为O(M) 其中M为字符数。
用户可以：
1）void insert(String str) 添加某个字符串，可以重复添加，每次算1个
2）int search(String str) 查询某个字符串在结构中还有几个
3) void delete(String str) 删掉某个字符串，可以重复删除，每次算1个
4）int prefixNumber(String str) 查询有多少个字符串，是以str做前缀的

	public static class Node1 {
		public int pass;
		public int end;
		public Node1[] nexts;

		// char tmp = 'b'  (tmp - 'a')
		public Node1() {
			pass = 0;
			end = 0;
			// 0    a
			// 1    b
			// 2    c
			// ..   ..
			// 25   z
			// nexts[i] == null   i方向的路不存在
			// nexts[i] != null   i方向的路存在
			nexts = new Node1[26];
		}
	}

	public static class Trie1 {
		private Node1 root;

		public Trie1() {
			root = new Node1();
		}

		public void insert(String word) {
			if (word == null) {
				return;
			}
			char[] str = word.toCharArray();
			Node1 node = root;
			node.pass++;
			int path = 0;
			for (int i = 0; i < str.length; i++) { // 从左往右遍历字符
				path = str[i] - 'a'; // 由字符，对应成走向哪条路
				if (node.nexts[path] == null) {
					node.nexts[path] = new Node1();
				}
				node = node.nexts[path];//node顺着path往下沉
				node.pass++;//node的pass加一个1
			}
			node.end++; //整个结束node来到最后一个节点的位置end+1
		}

		public void delete(String word) {
			if (search(word) != 0) {
				char[] chs = word.toCharArray();
				Node1 node = root;
				node.pass--;
				int path = 0;
				for (int i = 0; i < chs.length; i++) {
					path = chs[i] - 'a';
					if (--node.nexts[path].pass == 0) {
						node.nexts[path] = null;
						return;
					}
					node = node.nexts[path];
				}
				node.end--;
			}
		}

		// word这个单词之前加入过几次
		public int search(String word) {
			if (word == null) {
				return 0;
			}
			char[] chs = word.toCharArray();
			Node1 node = root;
			int index = 0;
			for (int i = 0; i < chs.length; i++) {
				index = chs[i] - 'a';
				if (node.nexts[index] == null) {
					return 0;
				}
				node = node.nexts[index];
			}
			return node.end;
		}

		// 所有加入的字符串中，有几个是以pre这个字符串作为前缀的
		public int prefixNumber(String pre) {
			if (pre == null) {
				return 0;
			}
			char[] chs = pre.toCharArray();
			Node1 node = root;
			int index = 0;
			for (int i = 0; i < chs.length; i++) {
				index = chs[i] - 'a';
				if (node.nexts[index] == null) {
					return 0;
				}
				node = node.nexts[index];
			}
			return node.pass;
		}
	}

	public static class Node2 {
		public int pass;
		public int end;
		//如果字符种类很多，则用hash表的形式去表达，key是字符转成整形的ascii码
		public HashMap<Integer, Node2> nexts;

		public Node2() {
			pass = 0;
			end = 0;
			nexts = new HashMap<>();
		}
	}

	public static class Trie2 {
		private Node2 root;

		public Trie2() {
			root = new Node2();
		}

		public void insert(String word) {
			if (word == null) {
				return;
			}
			char[] chs = word.toCharArray();
			Node2 node = root;
			node.pass++;
			int index = 0;
			for (int i = 0; i < chs.length; i++) {
				index = (int) chs[i];
				if (!node.nexts.containsKey(index)) {
					node.nexts.put(index, new Node2());
				}
				node = node.nexts.get(index);
				node.pass++;
			}
			node.end++;
		}

		public void delete(String word) {
			if (search(word) != 0) {
				char[] chs = word.toCharArray();
				Node2 node = root;
				node.pass--;
				int index = 0;
				for (int i = 0; i < chs.length; i++) {
					index = (int) chs[i];
					if (--node.nexts.get(index).pass == 0) {
						node.nexts.remove(index);
						return;
					}
					node = node.nexts.get(index);
				}
				node.end--;
			}
		}

		// word这个单词之前加入过几次
		public int search(String word) {
			if (word == null) {
				return 0;
			}
			char[] chs = word.toCharArray();
			Node2 node = root;
			int index = 0;
			for (int i = 0; i < chs.length; i++) {
				index = (int) chs[i];
				if (!node.nexts.containsKey(index)) {
					return 0;
				}
				node = node.nexts.get(index);
			}
			return node.end;
		}

		// 所有加入的字符串中，有几个是以pre这个字符串作为前缀的
		public int prefixNumber(String pre) {
			if (pre == null) {
				return 0;
			}
			char[] chs = pre.toCharArray();
			Node2 node = root;
			int index = 0;
			for (int i = 0; i < chs.length; i++) {
				index = (int) chs[i];
				if (!node.nexts.containsKey(index)) {
					return 0;
				}
				node = node.nexts.get(index);
			}
			return node.pass;
		}

计数排序&基数排序（基于桶排序的思想）

时间复杂度可以优化到O(N)而基于比较的排序最快只能做到O(N*logN)，
但是不基于比较的排序扩展性较低

计数排序，如要对一个无序的年龄数组进行排序，已知年龄的范围为
0-200，则可以创建一个最大年龄大小的数组，下标从0到最大年龄，然后将年龄对应
的人的数量依次找到对应的下标进行加1。然后输出每个下标对应的数量
代码如下：

// only for 0~200 value
	public static void countSort(int[] arr) {
		if (arr == null || arr.length < 2) {
			return;
		}
		int max = Integer.MIN_VALUE;
		for (int i = 0; i < arr.length; i++) {
			max = Math.max(max, arr[i]);
		}
		int[] bucket = new int[max + 1];
		for (int i = 0; i < arr.length; i++) {
			bucket[arr[i]]++;
		}
		int i = 0;
		for (int j = 0; j < bucket.length; j++) {
			while (bucket[j]-- > 0) {
				arr[i++] = j;
			}
		}
	}

**基数排序：**数据范围是非负的能够用十进制来理解的数。
有如下数
103，13，27，25，17，9
先找到最大值 103，十进制是3位，其他不够三位的数高位补0
103，013，027，025，017，009
准备如下几个桶
在这里插入图片描述
然后所有数字从左往右根据各位数字进桶

然后再依次倒出数字，先进的先倒出
103，013，025，027，017，009
接下来所有的数字根据十位数进桶

然后所有数字倒出来
103，009，013，017，025，027
最后百位数字进桶
在这里插入图片描述
再依次拿出数字
009，013，017，025，027，103
如上方式比较复杂，下面来看一个比较优雅的写法：
有如下数：
101，001，022，031，040
准备一个count数组记录个位上每一个数字出现的次数

更新成count‘，变成前缀累加和的形式
在这里插入图片描述
代表所有数字个位数小于等于0的有一个，个位数小于等于1的有四个

准备一个辅助数组，从右往左遍历原始数组：
40个位数字是0，count’的0位置减减
然后再来一个31，由于个位数字小于等于1的有四个，
所以应该占据0-3位置上，此时的31直接放3位置上
在这里插入图片描述
然后count‘的1位置上的数减减变成3

此时再来一个022，个位数小于等于2的有5个
在这里插入图片描述
此时将022放在4位置上，count’2位置上的数减减变成4

下一个数字001,现在小于等于1的数字有3个，然后001放2位置
在这里插入图片描述
然后count‘ 1位置上的数减减变成1
下一个数字101直接放在1位置上面 count’1位置上的数变成0，此时个位数排好序了

并且优雅的实现了倒出桶的数字列。

如上这个count‘只关心我个位数是某个数的位置填哪，这种方式就避免了开很多队列
去完成这个功能


	// only for no-negative value
	//针对于非负值
	public static void radixSort(int[] arr) {
		if (arr == null || arr.length < 2) {
			return;
		}
		//找到最大值的位数作为第四个参数
		radixSort(arr, 0, arr.length - 1, maxbits(arr));
	}

	public static int maxbits(int[] arr) {
		int max = Integer.MIN_VALUE;
		for (int i = 0; i < arr.length; i++) {
			max = Math.max(max, arr[i]);
		}
		int res = 0;
		while (max != 0) {
			res++;
			max /= 10;
		}
		return res;
	}

	// arr[L..R]排序  ,  最大值的十进制位数digit
	public static void radixSort(int[] arr, int L, int R, int digit) {
		final int radix = 10;//0-9
		int i = 0, j = 0;
		// 有多少个数准备多少个辅助空间
		int[] help = new int[R - L + 1];
		for (int d = 1; d <= digit; d++) { // 有多少位就进出几次
			// 10个空间
		    // count[0] 当前位(d位)是0的数字有多少个
			// count[1] 当前位(d位)是(0和1)的数字有多少个
			// count[2] 当前位(d位)是(0、1和2)的数字有多少个
			// count[i] 当前位(d位)是(0~i)的数字有多少个
			int[] count = new int[radix]; // count[0..9]
			for (i = L; i <= R; i++) {
				// 103  1   3
				// 209  1   9
				//求每个数字在某一位上的词频
				j = getDigit(arr[i], d);
				count[j]++;
			}
			//转换为count’
			for (i = 1; i < radix; i++) {
				count[i] = count[i] + count[i - 1];
			}
			//然后倒叙填数据到help里面
			for (i = R; i >= L; i--) {
				j = getDigit(arr[i], d);
				//如果当前位如果有五个数则去4位置 0-4范围刚好5个数
				help[count[j] - 1] = arr[i];
				count[j]--;//然后让自己这一位的词频减减
			}
			for (i = L, j = 0; i <= R; i++, j++) {
				//help拷贝回来继续下一轮
				arr[i] = help[j];
			}
		}
	}

	public static int getDigit(int x, int d) {
		return ((x / ((int) Math.pow(10, d - 1))) % 10);
	}

时间复杂度估算：
首先是N个数的遍历，然后进桶和出桶的次数为 log10max
O(N*Log10Max) 姑且认为是O(N)

排序的稳定性

稳定性是指同样大小的样本再排序之后不会改变相对次序，对基础类型来说，稳定性毫无意义对非基础类型来说，稳定性有重要意义有些排序算法可以实现成稳定的，而有些排序算法无论如何都实现不成稳定的

比如说买货的时候范围是300-500，先按照价格从小到大排序，然后再按照好评度进行排序，如果这两个
排序是稳定的，这样排序后的数据是物美价廉的

选择排序（不稳定）
冒泡排序（稳定）
插入排序（稳定）
归并排序（稳定）
快速排序（不稳定）
堆排序（不稳定）
基数排序（稳定）
计数排序（稳定）

排序算法总结

在这里插入图片描述

快排常数时间是最小的，常数时间快。

常见的坑

1、归并排序的额外空间复杂度可以变为O(1)，“归并排序内部缓存法”，但是
将变得不再稳定（不如选择堆排序）
2、“原地归并排序”，会让时间复杂度变为O(N^2)（不如选择冒泡）
3、快速排序稳定性改进，“01 sable sort”，但是会对样本数据要求更多（还不如选择桶排序）

在整型数组中，请把奇数放在数组左边，偶数放在数组右边，要求所有奇数之间原始的相对次序不变，所有偶数之间原始相对次序不变。时间复杂度做到O(N)，额外空间复杂度做到O(1)
这个是没法做到的，相当于快排的partition选某个数来划分，是需要借助额外辅助数组的，既然快排不能做到，那么这个也不能做到