Python 算法(五) 哈希函数(哈希表)、RandomPool结构、并查集结构、岛问题

最新推荐文章于 2025-05-09 01:16:35 发布

lllong33

最新推荐文章于 2025-05-09 01:16:35 发布

阅读量513

点赞数

CC 4.0 BY-SA版权

分类专栏：算法

本文链接：https://blog.youkuaiyun.com/lllong33/article/details/94591132

算法专栏收录该内容

8 篇文章

订阅专栏

本文深入探讨哈希函数的概念，包括其在压缩数据、解决冲突及均匀分布数据中的应用。同时，介绍了几种高效的数据结构，如哈希表、并查集、随机池等，以及在解决实际问题中的运用，例如岛问题的计数。此外，还讲解了一致性哈希和布隆过滤器等高级技巧。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

哈希函数: 又称散列算法、哈希函数，是从任何一种数据中创建小的数字“指纹”的方法。

将消息或数据压缩成摘要，使得数据量变小，将数据的格式固定下来。
或者说，即MD5、SHA等函数，实现将大集合映射为随机的小集合。
小集合具有均分性

哈希函数的技巧

16进制的数，0-f中的一个数, 取值范围[0, 16*16=2**64-1)
输入无穷大，输出固定
哈希冲突，两个不同输入对应一个输出
当输入域很大，输出值会有均匀出现返回值
经过哈希后得到一个16位的数，将16位分为两组高8位h1和低8位h2(或者用两个哈希函数md5和SHA)，由于每位都独立，就有h1+n*h2=x1，即可获取1000个哈希函数

哈希表: 根据键（Key）而直接访问在内存存储位置的数据结构。存放记录的数组称做哈希表(散列表)。

计算一个关于键值的函数，将所需查询的数据映射到表中一个位置来访问记录，这加快了查找速度。
将新增值不断磨到表的范围中，将数据挂在相应位置。
扩容和JVM的红黑树。前者复杂度可以到O(1)，本身是O(logN),离线扩容可以达到O(1)。默认看做O(1)

1、设置RandomPool结构

【题目】设计一种结构，在该结构中有如下三个功能：

insert(key)：将某个key加入到该结构，做到不重复加入。

delete(key)：将原本在结构中的某个key移除。

getRandom()：等概率随机返回结构中的任何一个key。

【要求】 Insert、delete和getRandom方法的时间复杂度都是O(1)

思路：准备两个字典，分别为key-index和index-key，用size变量实现getRandom，对于delete操作，即将最后一行放置要删除的位置。

class RandomPool:
    def __init__(self):
        self.key_index = {}
        self.index_key = {}
        self.size = 0

    def insert(self, key):
        if not self.key_index.__contains__(key):
            self.key_index[key] = self.size
            self.index_key[self.size] = key
            self.size += 1

    def delete(self, key):
        if self.key_index.__contains__(key):
            delete_index = self.key_index.get(key)
            last_index = self.size - 1
            last_key = self.index_key.get(last_index)
            self.key_index[last_key] = delete_index
            self.index_key[delete_index] = last_key
            self.key_index.pop(key)
            self.index_key.pop(last_index)

    def get_random(self):
        if self.size == 0:
            return None
        import random
        random_index = int(random.random() * self.size)  # 0~size-1
        return self.index_key.get(random_index)

    def main(self):
        pool = RandomPool()
        pool.insert("A")
        pool.insert("B")
        pool.insert("C")
        print(pool.key_index, pool.index_key)
        pool.delete('A')
        print(pool.key_index, pool.index_key)
        print(pool.get_random())
        print(pool.get_random())
        print(pool.get_random())
        print(pool.get_random())

RandomPool().main()

2、并查集结构

1、判断两个集合是否相同。判断的同时，将所有节点都直接指向头节点
2、合并两个集合
时间复杂度：查询次数 + 合并次数 = O(N)，即O(1)/次

class UnionFind:
    def __init__(self):
        self.father_map = {}
        self.size_map = {}

    def make_sets(self, nodes):
        self.father_map = {}
        self.size_map = {}
        for node in nodes:
            self.father_map[node] = node
            self.size_map[node] = 1

    def find_head(self, node):
        father =  self.father_map.get(node)
        if father != node:
            father = self.find_head(father)
        self.father_map[node] = father
        return father

    def is_same_set(self, a, b):
        return self.find_head(a) == self.find_head(b)

    def union(self, a, b):
        if a is None or b is None:
            return
        a_head = self.find_head(a)
        b_head = self.find_head(b)
        if a_head != b_head:
            a_set_size = self.size_map.get(a_head)
            b_set_size = self.size_map.get(b_head)
            if a_set_size <= b_set_size:
                self.father_map[a_head] = b_head
                self.size_map[b_head] = b_set_size + a_set_size
            else:
                self.father_map[b_head] = a_head
                self.size_map[a_head] = b_set_size + a_set_size

    def main(self):
        nodes1 = [1,2,3,4,5,6]
        self.make_sets(nodes1)
        self.find_head(1)
        self.union(1,2)
        print(self.is_same_set(1, 3))
        self.union(3, 4)
        self.union(2,4)
        print(self.is_same_set(1,3))
        print(self.father_map)
        print(self.size_map)

UnionFind().main()

3、岛问题

一个矩阵中只有0和1两种值，每个位置都可以和自己的上、下、左、右
四个位置相连，如果有一片1连在一起，这个部分叫做一个岛，求一个
矩阵中有多少个岛？

思路：直接遍历，碰到数字1就执行infect()函数，并递归执行，将上下左右碰到的1感染为2。然后继续遍历。

"""
Author: lllong33
data: 2019/7/1
"""
class IsLandas:
    def count_is_lands(self, m):
        if m is None or m[0] is None:
            return 0
        col = len(m)
        row = len(m[0])
        res = 0
        for i in range(col):
            for j in range(row):
                if m[i][j] == 1 :
                    res += 1
                    self.infect(m, i, j, col, row)
        return res

    def infect(self, m, i, j, col, row):
        if i<0 or i>=col or j<0 or j>=row or m[i][j]!=1:
            return
        m[i][j] = 2
        self.infect(m, i+1, j, col, row)
        self.infect(m, i-1, j, col, row)
        self.infect(m, i, j+1, col, row)
        self.infect(m, i, j-1, col, row)

    def main(self):
        m1 = [[0, 0, 0, 0, 0, 0, 0, 0, 0],
              [0, 1, 1, 1, 0, 1, 1, 1, 0],
              [0, 1, 1, 1, 0, 0, 0, 1, 0],
              [0, 1, 1, 0, 0, 0, 0, 0, 0],
              [0, 0, 0, 0, 0, 1, 1, 0, 0],
              [0, 0, 0, 0, 1, 1, 1, 0, 0],
              [0, 0, 0, 0, 0, 0, 0, 0, 0]]
        print(self.count_is_lands(m1))
        m2 = [[0, 0, 0, 0, 0, 0, 0, 0, 0],
              [0, 1, 1, 1, 1, 1, 1, 1, 0],
              [0, 1, 1, 1, 0, 0, 0, 1, 0],
              [0, 1, 1, 0, 0, 0, 1, 1, 0],
              [0, 0, 0, 0, 0, 1, 1, 0, 0],
              [0, 0, 0, 0, 1, 1, 1, 0, 0],
              [0, 0, 0, 0, 0, 0, 0, 0, 0]]
        print(self.count_is_lands(m2))

IsLandas().main()