PostgreSQL 中 min() 和 order by limit 1 的对比

 

在实际应用中,我们经常需要计算数据库中某张表的某个字段的的最小值/最大值,并用它来作为查询条件。

使用SQL求数据库中表字段的最小值/最大值,通常由两种方法:

一种是 使用聚集函数 min()/max(), 另一种是使用order BY + “字段” 子句,对该表排序之后,取第一条记录中该字段的值。

本文在PostgreSQL 11.4上,对min() 和 order by … limit 1 以及它们的变体做一个全方位的对比。

 

现在有一张表tb_person,它的定义如下:

 

create table tb_person (

    person_id integer not null,

    person_name varchar(64),

    sex int,

    birthday date,

    phone varchar(64),

    job_no varchar(48),

    department_id int,

    create_time timestamp with time zone,

    update_time timestamp with time zone,

    constraint pk_tb_person primary key (person_id)

);

comment on table  tb_person is '人员表';

comment on column tb_person.person_id is '主键id';

comment on column tb_person.person_name is '人员姓名';

comment on column tb_person.sex is '性别。0:男性;1:女性';

comment on column tb_person.birthday is '出生日期';

comment on column tb_person.phone is '联系电话';

comment on column tb_person.job_no is '员工工号';

comment on column tb_person.department_id is '所属部门id';

comment on column tb_person.create_time is '创建时间';

comment on column tb_person.update_time is '更新时间';

 

create index idx_tb_person_job_no on tb_person (job_no);

create index idx_tb_person_department_id on tb_person (department_id);

create index idx_tb_person_birthday on tb_person (birthday);

 

我们通过下面的sql 向表中插入300000 行数据:

insert into tb_person (person_id, person_name, sex, birthday, phone, job_no, department_id, create_time, update_time)

select num, substr('abcdefghijklmnopqrstuvwxyz', mod(num,26), 10), ceil(random()*2), '1970-01-01'::date + (interval '1 hour') * num, null,

to_char(num, 'FM000000'), ceil(random()*3000), now(), now() from

(select generate_series(1,300000) as num) a;

 

接下来我们通过一些查询案例来对比。

 

1. 查找人员表中最小的工号

获得人员表中最小的工号,有两种方法:

select min(job_no) from tb_person;

select job_no from tb_person order by job_no limit 1;

 

 

现在我们用 explain analyze,查看两种方法的查询计划,比较它们的性能。我们主要关注cost 和 execution time 两个指标。

 

查看第一种方法的查询计划:

explain analyze

select min(job_no) from tb_person;

 

结果如下:

Result  (cost=0.45..0.46 rows=1 width=32) (actual time=0.029..0.030 rows=1 loops=1)

  InitPlan 1 (returns $0)

    ->  Limit  (cost=0.42..0.45 rows=1 width=32) (actual time=0.023..0.024 rows=1 loops=1)

          ->  Index Only Scan using idx_tb_person_job_no on tb_person  (cost=0.42..8550.42 rows=300000 width=32) (actual time=0.020..0.020 rows=1 loops=1)

                Index Cond: (job_no IS NOT NULL)

                Heap Fetches: 0

Planning time: 0.161 ms

Execution time: 0.060 ms

 

查看第二种方法的查询计划:

explain analyze

select job_no from tb_person order by job_no limit 1;

 

结果如下:

Limit  (cost=0.42..0.46 rows=1 width=61) (actual time=0.032..0.033 rows=1 loops=1)

  ->  Index Scan using idx_tb_person_job_no on tb_person  (cost=0.42..11213.42 rows=300000 width=61) (actual time=0.028..0.028 rows=1 loops=1)

Planning time: 0.182 ms

Execution time: 0.069 ms

 

从上面的执行时间和代价估计可以看出,两种方法没有明显性能差别。

 

2. 查找工号最小的人

查找工号最小的人也有两种方法。

第一种方法是:

select * from tb_person where job_no = ( select min(job_no) from tb_person);

 

第二种方法是:

select * from tb_person order by job_no limit 1;

 

下面我们比较两种方法的查询计划:

查看第一种方法的查询计划:

explain analyze select * from tb_person where job_no = ( select min(job_no) from tb_person );

 

结果如下:

Index Scan using idx_tb_person_job_no on tb_person  (cost=0.88..8.90 rows=1 width=61) (actual time=0.129..0.131 rows=1 loops=1)

  Index Cond: ((job_no)::text = $1)

  InitPlan 2 (returns $1)

    ->  Result  (cost=0.45..0.46 rows=1 width=32) (actual time=0.084..0.085 rows=1 loops=1)

          InitPlan 1 (returns $0)

            ->  Limit  (cost=0.42..0.45 rows=1 width=32) (actual time=0.077..0.078 rows=1 loops=1)

                  ->  Index Only Scan using idx_tb_person_job_no on tb_person tb_person_1  (cost=0.42..8550.42 rows=300000 width=32) (actual time=0.072..0.072 rows=1 loops=1)

                        Index Cond: (job_no IS NOT NULL)

                        Heap Fetches: 0

Planning time: 0.255 ms

Execution time: 0.179 ms

 

查看第二种方法的查询计划:

explain analyze

select * from tb_person order by job_no limit 1;

 

结果如下:

Limit  (cost=0.42..0.46 rows=1 width=61) (actual time=0.021..0.022 rows=1 loops=1)

  ->  Index Scan using idx_tb_person_job_no on tb_person  (cost=0.42..11213.42 rows=300000 width=61) (actual time=0.018..0.018 rows=1 loops=1)

Planning time: 0.164 ms

Execution time: 0.054 ms

 

可以看出,第二种方法的性能优于第一种。第一种方法做了两次扫描,而第二种方法只做了一次。

 

3. 求每个部门工号最小的人的id

这里需要根据department_id分组。实现它也有两种方法。

第一种方法:

select person_id from tb_person where job_no in (

select min(job_no) from tb_person group by department_id

);

这种方法是根据 department_id 将人员数据分组,获取每个部门最小的job_no,再根据job_no 获取对应的person_id

 

第二种方法:

select (array_agg(person_id order by job_no))[1] from tb_person group by department_id

 

这种方法是根据department_id分组,将每个组的person_id 聚合成数组,聚合是按照job_no递增的顺序的,并取每个数组中第一个元素,即每个部门最小job_no对应的person_id

 

下面我们比较两种方法的性能。

查看第一种方法的查询计划:

explain analyze

select person_id from tb_person where job_no in (

select min(job_no) from tb_person group by department_id

);

 

结果如下:

Nested Loop  (cost=6291.75..6468.99 rows=9 width=4) (actual time=394.590..394.990 rows=21 loops=1)

  ->  HashAggregate  (cost=6291.33..6291.54 rows=21 width=32) (actual time=394.544..394.564 rows=21 loops=1)

        Group Key: min((tb_person_1.job_no)::text)

        ->  Finalize GroupAggregate  (cost=6290.54..6291.07 rows=21 width=36) (actual time=394.320..394.512 rows=21 loops=1)

              Group Key: tb_person_1.department_id

              ->  Sort  (cost=6290.54..6290.65 rows=42 width=36) (actual time=394.303..394.361 rows=63 loops=1)

                    Sort Key: tb_person_1.department_id

                    Sort Method: quicksort  Memory: 27kB

                    ->  Gather  (cost=6285.00..6289.41 rows=42 width=36) (actual time=393.681..394.231 rows=63 loops=1)

                          Workers Planned: 2

                          Workers Launched: 2

                          ->  Partial HashAggregate  (cost=5285.00..5285.21 rows=21 width=36) (actual time=305.755..305.774 rows=21 loops=3)

                                Group Key: tb_person_1.department_id

                                ->  Parallel Seq Scan on tb_person tb_person_1  (cost=0.00..4660.00 rows=125000 width=11) (actual time=0.015..117.754 rows=100000 loops=3)

  ->  Index Scan using idx_tb_person_job_no on tb_person  (cost=0.42..8.44 rows=1 width=11) (actual time=0.013..0.015 rows=1 loops=21)

        Index Cond: ((job_no)::text = (min((tb_person_1.job_no)::text)))

Planning time: 0.351 ms

Execution time: 403.014 ms

 

查看第二种方法的查询计划:

explain analyze

select (array_agg(person_id order by job_no))[1] from tb_person group by department_id

 

结果如下:

GroupAggregate  (cost=0.42..23822.21 rows=21 width=8) (actual time=67.788..2884.911 rows=21 loops=1)

  Group Key: department_id

  ->  Index Scan using idx_tb_person_department_id on tb_person  (cost=0.42..22321.95 rows=300000 width=15) (actual time=0.017..382.140 rows=300000 loops=1)

Planning time: 0.119 ms

Execution time: 2885.164 ms

 

可以看出,当分组字段的每个值对应的行数较多时(这里每个department_id对应100条数据),第二种方法的性能远远不如第一种。

 

4. 每个部门中抽选一人

这个问题也有两种实现。

第一种方法:

select min(person_id) , department_id from tb_person group by department_id

每个部门中抽选person_id最小的人。

 

第二种方法:

select distinct on (department_id) person_id,person_name,department_id  from tb_person order by department_id

每个部门抽选一人,不关注person_id的字典序。

 

 

比较下列两种方法的性能

查看第一种方法的查询计划:

explain analyze

 select min(person_id) , department_id from tb_person group by department_id

 

结果如下:

Finalize GroupAggregate  (cost=6290.54..6291.07 rows=21 width=8) (actual time=289.036..289.159 rows=21 loops=1)

  Group Key: department_id

  ->  Sort  (cost=6290.54..6290.65 rows=42 width=8) (actual time=289.025..289.079 rows=63 loops=1)

        Sort Key: department_id

        Sort Method: quicksort  Memory: 27kB

        ->  Gather  (cost=6285.00..6289.41 rows=42 width=8) (actual time=288.456..288.945 rows=63 loops=1)

              Workers Planned: 2

              Workers Launched: 2

              ->  Partial HashAggregate  (cost=5285.00..5285.21 rows=21 width=8) (actual time=210.916..210.938 rows=21 loops=3)

                    Group Key: department_id

                    ->  Parallel Seq Scan on tb_person  (cost=0.00..4660.00 rows=125000 width=8) (actual time=0.013..107.747 rows=100000 loops=3)

Planning time: 0.149 ms

Execution time: 296.498 ms

 

查看第二种方法的查询计划:

explain analyze

select distinct on (department_id)  person_id, department_id  from tb_person order by department_id

 

     结果如下:

Unique  (cost=0.42..23071.95 rows=21 width=8) (actual time=0.021..676.080 rows=21 loops=1)

  ->  Index Scan using idx_tb_person_department_id on tb_person  (cost=0.42..22321.95 rows=300000 width=8) (actual time=0.017..426.196 rows=300000 loops=1)

Planning time: 0.105 ms

Execution time: 676.146 ms

 

从上面的结果可以,看出第一种方法的性能好一些。

 

5. 查找每个部门中最年长的人

接下来查找每个部门中最年长的人。若两人出生日期,部门均相同,则任取一个,这个需求也有三种sql 实现。

 

第一种方法:

select min(person_id) from tb_person inner join (

 select min(birthday) as birthday, department_id from tb_person group by department_id

) a using (birthday, department_id) GROUP BY department_id;

 

第二种方法:

select (array_agg(person_id order by birthday))[1] from tb_person group by department_id

 

第三种方法:

SELECT distinct on (a.department_id) person_id from tb_person inner join

(

    select min(birthday) as birthday, department_id from tb_person group by department_id

) a using(birthday,department_id) order by a.department_id;

 

 

比较他们的性能:

查看第一种方法的查询计划:

explain analyze

select min(person_id) from tb_person inner join (

 select min(birthday) as birthday, department_id from tb_person group by department_id

) a using (birthday, department_id) GROUP BY department_id;

 

结果如下:

GroupAggregate  (cost=6295.27..9269.40 rows=17 width=8) (actual time=295.161..296.710 rows=21 loops=1)

  Group Key: tb_person.department_id

  ->  Nested Loop  (cost=6295.27..9269.14 rows=17 width=8) (actual time=295.067..296.616 rows=53 loops=1)

        ->  Finalize GroupAggregate  (cost=6290.54..6291.07 rows=21 width=8) (actual time=294.975..295.122 rows=21 loops=1)

              Group Key: tb_person_1.department_id

              ->  Sort  (cost=6290.54..6290.65 rows=42 width=8) (actual time=294.964..295.022 rows=63 loops=1)

                    Sort Key: tb_person_1.department_id

                    Sort Method: quicksort  Memory: 27kB

                    ->  Gather  (cost=6285.00..6289.41 rows=42 width=8) (actual time=294.303..294.874 rows=63 loops=1)

                          Workers Planned: 2

                          Workers Launched: 2

                          ->  Partial HashAggregate  (cost=5285.00..5285.21 rows=21 width=8) (actual time=209.180..209.203 rows=21 loops=3)

                                Group Key: tb_person_1.department_id

                                ->  Parallel Seq Scan on tb_person tb_person_1  (cost=0.00..4660.00 rows=125000 width=8) (actual time=0.015..106.866 rows=100000 loops=3)

        ->  Bitmap Heap Scan on tb_person  (cost=4.73..141.78 rows=2 width=12) (actual time=0.031..0.056 rows=3 loops=21)

              Recheck Cond: (birthday = (min(tb_person_1.birthday)))

              Filter: (tb_person_1.department_id = department_id)

              Rows Removed by Filter: 39

              Heap Blocks: exact=859

              ->  Bitmap Index Scan on idx_tb_person_birthday  (cost=0.00..4.73 rows=41 width=0) (actual time=0.011..0.011 rows=42 loops=21)

                    Index Cond: (birthday = (min(tb_person_1.birthday)))

Planning time: 0.303 ms

Execution time: 303.298 ms

 

查看第二种方法的查询计划:

explain analyze

select (array_agg(person_id order by birthday))[1] from tb_person group by department_id

 

结果如下:

GroupAggregate  (cost=0.42..23822.21 rows=21 width=8) (actual time=27.898..1042.577 rows=21 loops=1)

  Group Key: department_id

  ->  Index Scan using idx_tb_person_department_id on tb_person  (cost=0.42..22321.95 rows=300000 width=12) (actual time=0.022..465.663 rows=300000 loops=1)

Planning time: 0.175 ms

Execution time: 1050.754 ms

 

查看第三种方法的查询计划:

explain analyze

SELECT distinct on (a.department_id) person_id from tb_person inner join

(

select min(birthday) as birthday, department_id from tb_person group by department_id

) a using(birthday,department_id) order by a.department_id;

 

结果如下:

Unique  (cost=6295.27..9269.19 rows=17 width=8) (actual time=294.638..296.429 rows=21 loops=1)

  ->  Nested Loop  (cost=6295.27..9269.14 rows=17 width=8) (actual time=294.634..296.344 rows=53 loops=1)

        ->  Finalize GroupAggregate  (cost=6290.54..6291.07 rows=21 width=8) (actual time=294.538..294.702 rows=21 loops=1)

              Group Key: tb_person_1.department_id

              ->  Sort  (cost=6290.54..6290.65 rows=42 width=8) (actual time=294.506..294.586 rows=63 loops=1)

                    Sort Key: tb_person_1.department_id

                    Sort Method: quicksort  Memory: 27kB

                    ->  Gather  (cost=6285.00..6289.41 rows=42 width=8) (actual time=293.917..294.438 rows=63 loops=1)

                          Workers Planned: 2

                          Workers Launched: 2

                          ->  Partial HashAggregate  (cost=5285.00..5285.21 rows=21 width=8) (actual time=211.328..211.351 rows=21 loops=3)

                                Group Key: tb_person_1.department_id

                                ->  Parallel Seq Scan on tb_person tb_person_1  (cost=0.00..4660.00 rows=125000 width=8) (actual time=0.015..106.619 rows=100000 loops=3)

        ->  Bitmap Heap Scan on tb_person  (cost=4.73..141.78 rows=2 width=12) (actual time=0.036..0.062 rows=3 loops=21)

              Recheck Cond: (birthday = (min(tb_person_1.birthday)))

              Filter: (tb_person_1.department_id = department_id)

              Rows Removed by Filter: 39

              Heap Blocks: exact=859

              ->  Bitmap Index Scan on idx_tb_person_birthday  (cost=0.00..4.73 rows=41 width=0) (actual time=0.013..0.013 rows=42 loops=21)

                    Index Cond: (birthday = (min(tb_person_1.birthday)))

Planning time: 0.298 ms

Execution time: 304.337 ms

 

不出所料的是第二种方法的性能还是最差的,因为生日在同一天的人很多。但令人惊讶的但第一种,第三种方法性能相差无几,他们的执行计划也几乎相同。尽管第三种方法虽然使用了 distinct on … order by。因此distinct on … order by语句的性能并不总是低于 group by语句。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值