postgresql取第一行数据_postgresql-从表中随机获取一条记录

本文探讨了在PostgreSQL中如何高效地从大表中随机获取一行数据,比较了`order by random()`、`offset`、`tablesample system`和`tablesample bernoulli`的方法,分析了它们的执行计划、性能和适用场景。实验结果显示,对于大数据量表,`offset`和`tablesample`通常比`order by random()`更快,其中`tablesample system`效率更高,但`tablesample bernoulli`具有更好的随机性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

postgresql如何从表中高效的随机获取一条记录

select C_BH from db_scld.t_scld_cprscjl order by `random()` LIMIT 1;

select c_jdrybm from db_scld.t_jdry

where c_bmbm = v_scdd and c_sfyx ='1' and c_ryzszt not in ('05','12','11','07','09','13') order by `random()` limit 1

db_jdsjpt=# explain analyze select C_BH from db_scld.t_scld_cprscjl order by random() LIMIT 1;

QUERY PLAN

---------------------------------------------------------------------------------------------------

Limit (cost=61029.94..61029.94 rows=1 width=41) (actual time=587.193..587.193 rows=1 loops=1)

-> Sort (cost=61029.94..63172.22 rows=856911 width=41) (actual time=587.185..587.185 rows=1 loops=1)

Sort Key: (random())

Sort Method: top-N heapsort Memory: 25kB

-> Seq Scan on t_scld_cprscjl (cost=0.00..56745.39 rows=856911 width=41) (actual time=0.019..380.139 rows=854682 loop

s=1)

Planning time: 1.179 ms

Execution time: 587.242 ms

(7 rows)

--表总数量

db_jdsjpt=# select count(*) from db_scld.t_scld_cprscjl;

count

--------

854682

(1 row)

随机获取一条记录random()

random()耗时:Time: 389.818 ms

--随机获取一条耗时

db_jdsjpt=# select C_BH from db_scld.t_scld_cprscjl order by random() LIMIT 1;

c_bh

----------------------------------

6d861b011c854040bf5b731f49d40b48

(1 row)

Time: 389.818 ms

改写1

offset耗时:Time: 60.022 ms

--offset可以走索引,少了排序操作

db_jdsjpt=# select C_BH from db_scld.t_scld_cprscjl offset floor(random()*854682) LIMIT 1;

c_bh

----------------------------------

f90301bd8ac2485196ffae32ee70345c

(1 row)

Time: 60.022 ms

db_jdsjpt=# explain analyze select C_BH from db_scld.t_scld_cprscjl offset floor(random()*854682) LIMIT 1;

QUERY PLAN

---------------------------------------------------------------------------------------------------

Limit (cost=3747.64..3747.68 rows=1 width=33) (actual time=30.758..30.759 rows=1 loops=1)

-> Index Only Scan using i_corscjl_cprscbh_ on t_scld_cprscjl (cost=0.42..37472.65 rows=854682 width=33) (actual time=0.

047..25.808 rows=81993 loops=1)

Heap Fetches: 0

Planning time: 0.228 ms

Execution time: 30.802 ms

(5 rows)

Time: 31.779 ms

改写2

pg从9.5开始提供抽样函数

使用tablesample抽样的过程中比例不能太低,否则可能获取不到结果,且不能带有过滤条件

system耗时: Time: 0.639 ms

system:随机性较差,效率高

--改写后耗时

db_jdsjpt=# select c_bh from db_scld.t_scld_cprscjl tablesample system(0.1) limit 1;

c_bh

----------------------------------

e2fce25399db42f0bf49faf8e7214d5f

(1 row)

Time: 0.639 ms

--system随机抽样以块为单位所以更快

db_jdsjpt=# explain analyze select c_bh from db_scld.t_scld_cprscjl tablesample system(0.1) limit 1;

QUERY PLAN

---------------------------------------------------------------------------------------------------

Limit (cost=0.00..0.23 rows=1 width=33) (actual time=0.105..0.105 rows=1 loops=1)

-> Sample Scan on t_scld_cprscjl (cost=0.00..192.55 rows=855 width=33) (actual time=0.102..0.102 rows=1 loops=1)

Sampling: system ('0.1'::real)

Planning time: 0.190 ms

Execution time: 0.134 ms

(5 rows)

Time: 1.182 ms

改写3

bernoulli:随机性更好,但效率比system要低

bernoullih耗时:Time: 0.822 ms

db_jdsjpt=# select c_bh from db_scld.t_scld_cprscjl tablesample bernoulli(0.1) limit 1;

c_bh

----------------------------------

7ec30761ffd04bd9ad77797a33645a84

(1 row)

Time: 0.822 ms

--bernoulli以行为单位进行抽样,比system效率低点

db_jdsjpt=# explain analyze select c_bh from db_scld.t_scld_cprscjl tablesample bernoulli(0.1) limit 1;

QUERY PLAN

---------------------------------------------------------------------------------------------------

Limit (cost=0.00..53.85 rows=1 width=33) (actual time=1.410..1.411 rows=1 loops=1)

-> Sample Scan on t_scld_cprscjl (cost=0.00..46042.55 rows=855 width=33) (actual time=1.408..1.408 rows=1 loops=1)

Sampling: bernoulli ('0.1'::real)

Planning time: 0.446 ms

Execution time: 1.436 ms

(5 rows)

Time: 25.770 ms

同理另外一条sql也可用同样的方式,且在c_bmbm字段上面加上索引

当有条件的时候可以使用offset获取,offset的值也可以通过for循环传入

db_jdsjpt=# select count(*) from db_scld.t_jdry;

count

--------

214819

(1 row)

db_jdsjpt=# select c_jdrybm from db_scld.t_jdry where c_bmbm = '4402222804' and c_sfyx ='1' and c_ryzszt not in ('05','12','11','07','09','13') offset floor(random()*214819) limit 1;

c_jdrybm

----------

(0 rows)

Time: 1.924 ms

对比

方法

耗时

order by random()

389.818 ms

offset n

60.022 ms-240ms

system()

0.639 ms

bernoulli()

0.822 ms

使用offset的时候和n的大小有关系,当n越大,扫描的索引块越多,就越大,但是相对于order by random()的方式仍然要快。

注意

system(0.1)等于百分之零点一,也就是抽样千分之一 854682*0.001=854,大概每次抽取854条记录

--system

db_jdsjpt=# select count(*) from db_scld.t_scld_cprscjl tablesample system(0.1) ;

count

-------

592

(1 row)

Time: 1.499 ms

--bernoulli

db_jdsjpt=# select count(*) from db_scld.t_scld_cprscjl tablesample bernoulli(0.1) ;

count

-------

840

(1 row)

Time: 86.037 ms

这里可以看出bernoulli效率比system要低

结语

1.随机获取表中的一条数据,当表中数据较小时使用random感觉不明显,当数据量大时random由于每次都需要排序操作,导致随机获取一条的成本较高

4.随机获取一条记录可以使用limit 1 offset n-1的方式,或者使用随机抽样的方式

5.无论是使用limit 1 offset n还是使用tablesample随机抽样方式都需要知道表中的数据量,不能超过表数据量

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值