PostgreSQL 秒杀场景优化

最新推荐文章于 2024-03-18 18:10:28 发布

Postgresql中国用户会

最新推荐文章于 2024-03-18 18:10:28 发布

阅读量2.1k

点赞数

分类专栏：转载数据库性能优化文章标签： postgresql 性能优化数据库

转载同时被 2 个专栏收录

108 篇文章

订阅专栏

数据库性能优化

14 篇文章

订阅专栏

Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大，国内顶级PostgreSQL数据库专家将悉数到场，并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:

Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
Postgres-XL的项目发起人Mason Sharp
pgpool的作者石井达夫(Tatsuo Ishii)
PG-Strom的作者海外浩平(Kaigai Kohei)
Greenplum研发总监姚延栋
周正中(德哥), PostgreSQL中国用户会创始人之一
汪洋，平安科技数据库技术部经理
……

 
 2015年度PG大象会报名地址：http://postgres2015.eventdove.com/
PostgreSQL中国社区： http://postgres.cn/
PostgreSQL专业1群： 3336901（已满）
PostgreSQL专业2群： 100910388
PostgreSQL专业3群： 150657323

秒杀场景的典型瓶颈在于对同一条记录的多次更新请求，然后只有一个或者少量请求是成功的，其他请求是以失败或更新不到告终。

例如，Iphone的1元秒杀，如果我只放出1台Iphone，我们把它看成一条记录，秒杀开始后，谁先抢到（更新这条记录的锁），谁就算秒杀成功。

例如：

使用一个标记位来表示这条记录是否已经被更新，或者记录更新的次数（几台Iphone）。


 
  update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5;   -- 假设可以秒杀5台

这种方法的弊端：

获得锁的用户在处理这条记录时，可能成功，也可能失败，或者可能需要很长时间，（例如数据库响应慢）在它结束事务前，其他会话只能等着。

等待是非常不科学的，因为对于没有获得锁的用户，等待是在浪费时间。

所以一般的优化处理方法是先使用for update nowait的方式来避免等待，即如果无法即可获得锁，那么就不等待。

例如：


  
   begin;
  
  
   select 1 from tbl where id=pk for update nowait;  --  如果用户无法即刻获得锁，则返回错误。从而这个事务回滚。
  
  
   update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5;
  
  
   end;

这种方法可以减少用户的等待时间，因为无法即刻获得锁后就直接返回了。

但是这种方法也存在一定的弊端，对于一个商品，如果可以秒杀多台的话，我们用1条记录来存储多台，降低了秒杀的并发性。

因为我们用的是行锁。

解决这个问题办法很多，最终就是要提高并发性，例如：

1. 分段秒杀，把商品数量打散，拆成多个段，从而提高并发处理能力。

总体来说，优化的思路是减少锁等待时间，避免串行，尽量并行。

优化到这里就结束了吗？显然没有，以上方法任意数据库都可以做到，如果就这样结束怎么体现PostgreSQL的特性呢？

PostgreSQL还提供了一个锁类型，advisory锁，这种锁比行锁更加轻量，支持会话级别和事务级别。（但是需要注意ID是全局的，否则会相互干扰，也就是说，所有参与秒杀或者需要用到advisory lock的ID需要在单个库内保持全局唯一）

例子：


  
   update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5 and pg_try_advisory_xact_lock(:id);

最后必须要对比一下for update nowait和advisory lock的性能。

下面是在一台本地虚拟机上的测试。

新建一张秒杀表


   
    postgres=# \d t1
   
   
          Table "public.t1"
   
   
     Column |  Type   | Modifiers 
   
   
    --------+---------+-----------
   
   
     id     | integer | not null
   
   
     info   | text    | 
   
   
    Indexes:
   
   
        "t1_pkey" PRIMARY KEY, btree (id)

只有一条记录，不断的被更新


   
    postgres=# select * from t1;
   
   
     id |             info              
   
   
    ----+-------------------------------
   
   
      1 | 2015-09-14 09:47:04.703904+08
   
   
    (1 row)

压测for update nowait的方式：


  
   
    CREATE OR REPLACE FUNCTION public.f1(i_id integer)  
   
   
     RETURNS void  
   
   
     LANGUAGE plpgsql  
   
   
    AS $function$ 
   
   
    declare 
   
   
    begin 
   
   
      perform 1 from t1 where id=i_id for update nowait; 
   
   
      update t1 set info=now()::text where id=i_id; 
   
   
      exception when others then  
   
   
      return; 
   
   
    end; 
   
   
    $function$;  
   
  
  
   

  
  
   
    postgres@digoal-> cat test1.sql
   
   
    \setrandom id 1 1
   
   
    select f1(:id);

压测advisory lock的方式：


  
   postgres@digoal-> cat test.sql
  
  
   \setrandom id 1 1
  
  
   update t1 set info=now()::text where id=:id and pg_try_advisory_xact_lock(:id);

清除压测统计数据：


  
   
    postgres=# select pg_stat_reset();
   
   
     pg_stat_reset 
   
   
    ---------------
   
   
     
   
   
    (1 row)
   
  
  
   
    postgres=# select * from pg_stat_all_tables where relname='t1';
   
   
    -[ RECORD 1 ]-------+-------
   
   
    relid               | 184731
   
   
    schemaname          | public
   
   
    relname             | t1
   
   
    seq_scan            | 0
   
   
    seq_tup_read        | 0
   
   
    idx_scan            | 0
   
   
    idx_tup_fetch       | 0
   
   
    n_tup_ins           | 0
   
   
    n_tup_upd           | 0
   
   
    n_tup_del           | 0
   
   
    n_tup_hot_upd       | 0
   
   
    n_live_tup          | 0
   
   
    n_dead_tup          | 0
   
   
    n_mod_since_analyze | 0
   
   
    last_vacuum         | 
   
   
    last_autovacuum     | 
   
   
    last_analyze        | 
   
   
    last_autoanalyze    | 
   
   
    vacuum_count        | 0
   
   
    autovacuum_count    | 0
   
   
    analyze_count       | 0
   
   
    autoanalyze_count   | 0

压测结果：

postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 20 -j 20 -T 60

......

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 20

number of threads: 20

duration: 60 s

number of transactions actually processed: 792029

latency average: 1.505 ms

latency stddev: 4.275 ms

tps = 13196.542846 (including connections establishing)

tps = 13257.270709 (excluding connections establishing)

statement latencies in milliseconds:

0.002625 \setrandom id 1 1

1.502420 select f1(:id);

postgres=# select * from pg_stat_all_tables where relname='t1';

-[ RECORD 1 ]-------+-------

relid | 184731

schemaname | public

relname | t1

seq_scan | 0

seq_tup_read | 0

idx_scan | 896963 // 大多数是无用功

idx_tup_fetch | 896963 // 大多数是无用功

n_tup_ins | 0

n_tup_upd | 41775

n_tup_del | 0

n_tup_hot_upd | 41400

n_live_tup | 0

n_dead_tup | 928

n_mod_since_analyze | 41774

last_vacuum |

last_autovacuum |

last_analyze |

last_autoanalyze |

vacuum_count | 0

autovacuum_count | 0

analyze_count | 0

autoanalyze_count | 0

postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test.sql -c 20 -j 20 -T 60

......

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 20

number of threads: 20

duration: 60 s

number of transactions actually processed: 1392372

latency average: 0.851 ms

latency stddev: 2.475 ms

tps = 23194.831054 (including connections establishing)

tps = 23400.411501 (excluding connections establishing)

statement latencies in milliseconds:

0.002594 \setrandom id 1 1

0.848536 update t1 set info=now()::text where id=:id and pg_try_advisory_xact_lock(:id);

postgres=# select * from pg_stat_all_tables where relname='t1';

-[ RECORD 1 ]-------+--------

relid | 184731

schemaname | public

relname | t1

seq_scan | 0

seq_tup_read | 0

idx_scan | 1368933 // 大多数是无用功

idx_tup_fetch | 1368933 // 大多数是无用功

n_tup_ins | 0

n_tup_upd | 54957

n_tup_del | 0

n_tup_hot_upd | 54489

n_live_tup | 0

n_dead_tup | 1048

n_mod_since_analyze | 54957

last_vacuum |

last_autovacuum |

last_analyze |

last_autoanalyze |

vacuum_count | 0

autovacuum_count | 0

analyze_count | 0

autoanalyze_count | 0

我们注意到，不管用哪种方法，都会浪费掉很多次的无用功扫描。

为了解决无用扫描的问题，可以使用以下函数。（当然，还有更好的方法是对用户透明。）

CREATE OR REPLACE FUNCTION public.f(i_id integer)

RETURNS void

LANGUAGE plpgsql

AS $function$

declare

a_lock boolean := false;

begin

select pg_try_advisory_xact_lock(i_id) into a_lock;

if a_lock then

update t1 set info=now()::text where id=i_id;

end if;

exception when others then

return;

end;

$function$;

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 20

number of threads: 20

duration: 60 s

number of transactions actually processed: 1217195

latency average: 0.973 ms

latency stddev: 3.563 ms

tps = 20283.314001 (including connections establishing)

tps = 20490.143363 (excluding connections establishing)

statement latencies in milliseconds:

0.002703 \setrandom id 1 1

0.970209 select f(:id);

postgres=# select * from pg_stat_all_tables where relname='t1';

-[ RECORD 1 ]-------+-------

relid | 184731

schemaname | public

relname | t1

seq_scan | 0

seq_tup_read | 0

idx_scan | 75927

idx_tup_fetch | 75927

n_tup_ins | 0

n_tup_upd | 75927

n_tup_del | 0

n_tup_hot_upd | 75902

n_live_tup | 0

n_dead_tup | 962

n_mod_since_analyze | 75927

last_vacuum |

last_autovacuum |

last_analyze |

last_autoanalyze |

vacuum_count | 0

autovacuum_count | 0

analyze_count | 0

autoanalyze_count | 0

除了吞吐率的提升，我们其实还看到真实的处理数（更新次数）也有提升，所以不仅仅是降低了等待延迟，实际上也提升了处理能力。

最后提供一个物理机上的数据参考，使用128个并发连接，同时对一条记录进行更新：

不做任何优化的并发处理能力：

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 128

number of threads: 128

duration: 100 s

number of transactions actually processed: 285673

latency average: 44.806 ms

latency stddev: 45.751 ms

tps = 2855.547375 (including connections establishing)

tps = 2855.856976 (excluding connections establishing)

statement latencies in milliseconds:

0.002509 \setrandom id 1 1

44.803299 update t1 set info=now()::text where id=:id;

使用for update nowait的并发处理能力：

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 128

number of threads: 128

duration: 100 s

number of transactions actually processed: 6663253

latency average: 1.919 ms

latency stddev: 2.804 ms

tps = 66623.169445 (including connections establishing)

tps = 66630.307999 (excluding connections establishing)

statement latencies in milliseconds:

0.001934 \setrandom id 1 1

1.917297 select f1(:id);

使用advisory lock后的并发处理能力：

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 128

number of threads: 128

duration: 100 s

number of transactions actually processed: 19154754

latency average: 0.667 ms

latency stddev: 1.054 ms

tps = 191520.550924 (including connections establishing)

tps = 191546.208051 (excluding connections establishing)

statement latencies in milliseconds:

0.002085 \setrandom id 1 1

0.664420 select f(:id);

使用advisory lock，性能相比不做任何优化性能提升了约66倍，相比 for update nowait性能提升了约1.8倍。

这种优化可以快速告诉用户是否能秒杀到此类商品，而不需要等待其他用户更新结束后才知道。所以大大降低了RT，提高了吞吐率。

[参考]

1. http://www.postgresql.org/docs/9.5/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS