Hash Functions

本文介绍了多种经典的哈希函数,包括djb2、sdbm和loselose等,并提供了具体的实现代码。通过对比不同哈希函数的特点,帮助读者理解如何选择合适的哈希算法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Hash Functions

A comprehensive collection of hash functions, a hash visualiser and some test results [see Mckenzie et al. Selecting a Hashing Algorithm, SP&E 20(2):209-224, Feb 1990] will be available someday. If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes. you are not likely to do better with one of the "well known" functions such as PJW, K&R[1], etc. Also see tpop pp. 126 for graphing hash functions. 

djb2

this algorithm (k=33) was first reported by dan bernstein many years ago in comp.lang.c. another version of this algorithm (now favored by bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of number 33 (why it works better than many other constants, prime or not) has never been adequately explained.
    unsigned long
    hash(unsigned char *str)
    {
        unsigned long hash = 5381;
        int c;

        while (c = *str++)
            hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

        return hash;
    }

sdbm

this algorithm was created for sdbm (a public-domain reimplementation of ndbm) database library. it was found to do well in scrambling bits, causing better distribution of the keys and fewer splits. it also happens to be a good general hashing function with good distribution. the actual function is hash(i) = hash(i - 1) * 65599 + str[i]; what is included below is the faster version used in gawk. [there is even a faster, duff-device version] the magic constant 65599 was picked out of thin air while experimenting with different constants, and turns out to be a prime. this is one of the algorithms used in berkeley db (see sleepycat) and elsewhere.
    static unsigned long
    sdbm(str)
    unsigned char *str;
    {
        unsigned long hash = 0;
        int c;

        while (c = *str++)
            hash = c + (hash << 6) + (hash << 16) - hash;

        return hash;
    }

lose lose

This hash function appeared in K&R (1st ed) but at least the reader was warned: " This is not the best possible algorithm, but it has the merit of extreme simplicity." This is an understatement; It is a terrible hashing algorithm, and it could have been much better without sacrificing its "extreme simplicity." [see the second edition!] Many C programmers use this function without actually testing it, or checking something like Knuth's Sorting and Searching, so it stuck. It is now found mixed with otherwise respectable code, eg. cnews. sigh. [see also: tpop]
    unsigned long
    hash(unsigned char *str)
    {
	unsigned int hash = 0;
	int c;

	while (c = *str++)
	    hash += c;

	return hash;
    }



<think>好的,我现在要帮用户解决如何设置字体大小的问题。用户提供的代码是使用Matplotlib绘图的Python函数。我需要仔细看看代码,然后思考如何调整字体大小。 首先,用户给出的函数是draw_sim,里面用到了fig.add_subplot创建了一个子图,然后设置了x轴、y轴的标签和标题。用户的问题是如何设置这些元素的字体大小。 我记得在Matplotlib中,设置字体大小通常有两种方法:一种是通过全局设置,比如修改rcParams;另一种是在具体的元素上设置,比如在set_xlabel时指定fontsize参数。用户可能希望局部调整,所以应该优先考虑后者。 首先看ax.set_xlabel,这里可以添加fontsize参数,例如ax.set_xlabel('...', fontsize=12)。同样,set_ylabel和set_title也是一样的道理。标题可能还需要调整字体大小,比如fontsize=14。 另外,y轴的刻度标签是用ax.set_yticklabels设置的,这里的列表推导式格式化了标签,但当前没有指定字体大小。可以在set_yticklabels里加入fontsize参数,或者在之前用tick_params来设置。比如,ax.tick_params(axis='both', labelsize=10) 这样同时调整x和y轴的刻度标签大小。 也有可能用户希望调整全局的字体设置,比如在plt.figure之后使用rcParams,比如plt.rcParams['font.size'] = 12。但用户可能只需要调整当前图的字体,所以局部设置更合适。 检查代码中的各个部分:xlabel、ylabel、title、以及tick labels。每个地方都需要添加fontsize参数。例如: ax.set_xlabel('Number of Hash Functions', fontsize=12) ax.set_ylabel('Jaccard Similarity with Naive Method', fontsize=12) ax.set_title('Correctness vs. Number of Hash Functions', fontsize=14) 然后,在设置yticklabels的时候,可以添加fontsize: ax.set_yticklabels(['...'], fontsize=10) 或者,使用tick_params来统一设置: ax.tick_params(axis='both', which='major', labelsize=10) 这样会更简洁,同时设置x和y轴的刻度标签大小。 另外,图例(如果有的话)也需要设置字体大小,但用户当前的代码中没有图例,所以可能不需要考虑这点。 总结一下,用户需要修改的地方包括:x轴标签、y轴标签、标题的字体大小,以及刻度标签的字体大小。可以通过在每个设置函数中添加fontsize参数,或者使用tick_params来调整刻度标签的大小。</think>要设置Matplotlib图表中的字体大小,可以通过以下方法逐步修改代码: $$修改后代码:$$ ```python def draw_sim(sims, h_num): fig = plt.figure(dpi=600) ax = fig.add_subplot(111) # 设置全局字体(可选) plt.rcParams['font.size'] = 10 # 默认字体大小 # 绘制曲线 ax.plot(h_num, sims) # 设置坐标轴标签及字体 ax.set_xlabel('Number of Hash Functions', fontsize=12) # X轴标签 ax.set_yaxlabel('Jaccard Similarity with Naive Method', fontsize=12) # Y轴标签 # 设置标题及字体 ax.set_title('Correctness vs. Number of Hash Functions', fontsize=14, pad=20) # pad调整标题间距 # 设置刻度值字体 ax.tick_params(axis='both', which='major', labelsize=10) # 同时设置x/y轴刻度 # 格式化百分比标签 vals = ax.get_yticks() ax.set_yticklabels(['{:,.1%}'.format(x) for x in vals], fontsize=10) # 单独设置y轴 plt.show() ``` $$参数说明表$$ | 组件 | 关键参数 | 推荐值 | 作用域 | |------------------|-------------------------|--------|------------------| | 全局字体 | plt.rcParams['font.size'] | 10 | 影响所有未单独设置的文本 | | 坐标轴标签 | fontsize | 12 | 仅当前轴标签 | | 图表标题 | fontsize | 14 | 仅标题文本 | | 刻度数值 | labelsize | 10 | 当前轴刻度值 | | 百分比标签 | fontsize | 10 | 特定格式化标签 | $$扩展设置方法$$ 1. **动态调整公式字体**: ```python plt.rcParams['mathtext.fontset'] = 'stix' # 设置数学公式字体 plt.rcParams['mathtext.default'] = 'regular' # 设置数学公式样式 ``` 2. **使用样式表批量设置**: ```python plt.style.use('seaborn-paper') # 自动应用预定义样式 plt.rcParams.update({'font.size': 12}) # 覆盖样式表设置 ``` 3. **Latex公式字体**(需安装LaTeX环境): ```python plt.rcParams['text.usetex'] = True # 启用LaTeX渲染 plt.rcParams['text.latex.preamble'] = r'\usepackage{amsmath}' # 添加宏包 ``` 建议优先使用组件级参数设置(如`fontsize=12`),避免全局设置影响其他图表。对于学术论文插图,推荐X/Y轴标签使用12pt,标题使用14pt,刻度标签使用10pt。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值