Citus DB安装及分库分表测试初体验

本文介绍使用Vagrant搭建测试服务器,安装PostgreSQL 11和Citus。详细说明了安装软件包、创建数据库实例、配置Citus分表功能的步骤,包括添加worker节点、创建测试表、添加主键等,最后从官方获取示例数据导入并检查各worker节点的数据分布情况。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

原创文章,转载须注明出处。访问我的Github(地址:https://guobo507.github.io)查看最新文章列表。

CitusData:Citus DB分布式数据库系统是一个将SQL的表现力、关系型数据库的性能,以及Hadoop的可扩展性与可用性有效地整合的数据库产品。

Citus Documentation:https://docs.citusdata.com/en/v8.2/

本文中,我们将适用Vagrant来快速搭建一台测试服务器,并在上面安装PostgreSQL 11和Citus来测试。

Vagrantfile内容如下(路径:~/vagrantdata/pgcitus/):

Vagrant.configure("2") do |config|
  config.vm.hostname = "pgcitus"
  config.vm.box = "centos7/1902_01"
  config.vm.box_check_update = false
  config.vm.network "private_network", ip: "10.128.0.30"
  config.vm.network :forwarded_port, guest: 22, host: 10030, auto_correct: true
#  config.vm.network :forwarded_port, guest: 9000, host: 9000, auto_correct: true
  config.vm.provider "virtualbox" do |vb|
    vb.name = "vagrant_pgcitus"
    vb.gui = false
    vb.memory = "10240"
    vb.cpus = 4
  end
  config.vm.provision "shell", inline: <<-SHELL
    yum install -y openssh-server net-tools wget dstat &> /dev/null
    echo 'LoginGraceTime 0' >> /etc/ssh/sshd_config
    echo 'PermitRootLogin yes' >> /etc/ssh/sshd_config
    echo 'PasswordAuthentication yes' >> /etc/ssh/sshd_config
    sed -i "/^PasswordAuthentication no/d" /etc/ssh/sshd_config
    echo r00tr00t |passwd --stdin root
    systemctl enable sshd
    systemctl restart sshd
    echo "export TZ='Asia/Shanghai'" >> /etc/profile
    echo "export LANG=en_US.UTF-8" >> /etc/profile
    echo "alias df='df -hTP'" >> /etc/profile
    echo "alias df='df -hP'" >> /etc/profile
    echo "alias free='free -h'" >> /etc/profile
    sed -i "/pgcitus/d" /etc/hosts
    sed -i "/SELINUX=/s/enforcing/disabled/g" /etc/selinux/config
    echo >> /etc/hosts
    echo "10.128.0.30    pgcitus" >> /etc/hosts
  SHELL
end

启动虚拟机:

cd ~/vagrantdata/pgcitus/
vagrant up

1 安装Citus和PostgreSQL软件包

以下内容需连接到虚拟机上操作,如下所示:

ssh root@10.128.0.30

1.1 创建postgres用户

groupadd postgres
useradd -g postgres postgres
echo postgres |passwd --stdin postgres

1.2 适用Citus提供的脚本安装YUM仓库

curl https://install.citusdata.com/community/rpm.sh |bash

实际上,可以直接安装PGDG源,其中也包含了Citus软件包。

1.4 安装PostgreSQL软件包

yum install -y postgresql11-server postgresql11-contrib 

1.5 安装Citus软件包

yum install -y  citus82_11 citus82_11-debuginfo

1.3 创建各数据库实例的数据文件目录

mkdir -p /u01/pgdata/{master,worker1,worker2,worker3}
chown -R postgres.postgres /u01/*
chmod 700 /u01/pgdata/{master,worker1,worker2,worker3}

2 创建各数据库实例

2.1 初始化master和三个worker数据库实例

su - postgres
echo "export PATH=/usr/pgsql-11/bin:\$PATH" >> ~/.bash_profile
source ~/.bash_profile

echo postgres > ./password.tmp
/usr/pgsql-11/bin/initdb -D /u01/pgdata/master --auth=trust --encoding=utf8 --locale=C --wal-segsize=16 --pwfile=./password.tmp
/usr/pgsql-11/bin/initdb -D /u01/pgdata/worker1 --auth=trust --encoding=utf8 --locale=C --wal-segsize=16 --pwfile=./password.tmp
/usr/pgsql-11/bin/initdb -D /u01/pgdata/worker2 --auth=trust --encoding=utf8 --locale=C --wal-segsize=16 --pwfile=./password.tmp
/usr/pgsql-11/bin/initdb -D /u01/pgdata/worker3 --auth=trust --encoding=utf8 --locale=C --wal-segsize=16 --pwfile=./password.tmp
rm -f ./password.tmp

ls -l /u01/pgdata/*/

2.2 为每个实例创建归档日志目录

mkdir -p /u01/pgdata/{master,worker1,worker2,worker3}/pg_archive

2.3 启用每个实例的wal日志归档

INSTANCE_LIST='master worker1 worker2 worker3'
for d in $INSTANCE_LIST; do sed -i "/^#archive_mode/s/^#//g" /u01/pgdata/$d/postgresql.conf; done
for d in $INSTANCE_LIST; do sed -i "/^archive_mode/s/off/on/g" /u01/pgdata/$d/postgresql.conf; done
for d in $INSTANCE_LIST; do sed -i "223 aarchive_command = 'test ! -f /u01/pgdata/$d/pg_archive/%f && cp %p /u01/pgdata/$d/pg_archive/%f'" /u01/pgdata/$d/postgresql.conf; done

2.4 启用citus和pg_stat_statements扩展

INSTANCE_LIST='master worker1 worker2 worker3'
for d in $INSTANCE_LIST; do echo "shared_preload_libraries = 'citus,pg_stat_statements'" >> /u01/pgdata/$d/postgresql.conf; done

2.5 修改各实例的监听端口

修改各实例的监听端口:

sed -i "/^port/s/5432/9000/g" /u01/pgdata/master/postgresql.conf
for i in `seq 1 3`; do sed -i "/^port/s/5432/900$i/g" /u01/pgdata/worker$i/postgresql.conf; done
INSTANCE_LIST='master worker1 worker2 worker3'
for d in $INSTANCE_LIST; do grep ^port /u01/pgdata/$d/postgresql.conf; done

2.6 根据需要修改各实例的其他参数配置

根据需要,修改所有实例的其他配置,如监听地址、wal参数、checkpoint参数以及日志功能参数等:

INSTANCE_LIST='master worker1 worker2 worker3'
for d in $INSTANCE_LIST; do
    echo "shared_preload_libraries = 'citus,pg_stat_statements'" >> /u01/pgdata/$d/postgresql.conf
    sed -i "/^#listen_addresses/s/^#//g" /u01/pgdata/$d/postgresql.conf
    sed -i "/^listen_addresses/s/localhost/*/g" /u01/pgdata/$d/postgresql.conf
    sed -i "/^shared_buffers/s/128/512/g" /u01/pgdata/$d/postgresql.conf
    sed -i "/^#listen_address/s/^#//g" /u01/pgdata/$d/postgresql.conf                 
    sed -i "/^#port/s/^#//g" /u01/pgdata/$d/postgresql.conf                           
    sed -i "/^listen_address/s/localhost/*/g" /u01/pgdata/$d/postgresql.conf          
    sed -i "/^shared_buffers/s/128/2048/g" /u01/pgdata/$d/postgresql.conf             
    sed -i "/^max_connections/s/100/256/g" /u01/pgdata/$d/postgresql.conf             
    sed -i "/^#superuser_reserved_connections/s/^#//g" /u01/pgdata/$d/postgresql.conf 
    sed -i "/^#wal_level/s/^#//g" /u01/pgdata/$d/postgresql.conf                      
    sed -i "/^#fsync/s/^#//g" /u01/pgdata/$d/postgresql.conf                          
    sed -i "/^#checkpoint_completion_target/s/^#//g" /u01/pgdata/$d/postgresql.conf   
    sed -i "/^checkpoint_completion_target/s/0.5/0.9/g" /u01/pgdata/$d/postgresql.conf
    sed -i "/^#max_wal_senders/s/^#//g" /u01/pgdata/$d/postgresql.conf                
    sed -i "/^#wal_keep_segments/s/^#//g" /u01/pgdata/$d/postgresql.conf              
    sed -i "/^wal_keep_segments/s/0/64/g" /u01/pgdata/$d/postgresql.conf              
    sed -i "/^#effective_cache_size/s/^#//g" /u01/pgdata/$d/postgresql.conf           
    sed -i "/^effective_cache_size/s/4/2/g" /u01/pgdata/$d/postgresql.conf            
    sed -i "/^#log_file_mode/s/^#//g" /u01/pgdata/$d/postgresql.conf                  
    sed -i "/^#log_checkpoints/s/^#//g" /u01/pgdata/$d/postgresql.conf                
    sed -i "/^log_checkpoints/s/off/on/g" /u01/pgdata/$d/postgresql.conf              
    sed -i "/^#log_connections/s/^#//g" /u01/pgdata/$d/postgresql.conf                
    sed -i "/^log_connections/s/off/on/g" /u01/pgdata/$d/postgresql.conf              
    sed -i "/^#log_disconnections/s/^#//g" /u01/pgdata/$d/postgresql.conf             
    sed -i "/^log_disconnections/s/off/on/g" /u01/pgdata/$d/postgresql.conf           
    sed -i "/^#log_duration/s/^#//g" /u01/pgdata/$d/postgresql.conf                   
    sed -i "/^log_duration/s/off/on/g" /u01/pgdata/$d/postgresql.conf                 
    sed -i "/^#log_error_verbosity/s/^#//g" /u01/pgdata/$d/postgresql.conf            
    sed -i "/^#log_lock_waits/s/^#//g" /u01/pgdata/$d/postgresql.conf                 
    sed -i "/^log_lock_waits/s/off/on/g" /u01/pgdata/$d/postgresql.conf               
    sed -i "/^#log_statement/s/^#//g" /u01/pgdata/$d/postgresql.conf                  
    sed -i "/^log_statement/s/none/all/g" /u01/pgdata/$d/postgresql.conf              
    sed -i "/^#log_replication_commands/s/^#//g" /u01/pgdata/$d/postgresql.conf       
    sed -i "/^log_replication_commands/s/off/on/g" /u01/pgdata/$d/postgresql.conf     
    sed -i "/^max_wal_size/s/1/2/g" /u01/pgdata/$d/postgresql.conf                    
    sed -i "/^min_wal_size/s/80/512/g" /u01/pgdata/$d/postgresql.conf                 
done

2.7 创建统一的集群管理脚本

mkdir ~/scripts
cat << EOF > ~/scripts/start_citus.sh
INSTANCE_LIST='master worker1 worker2 worker3'
for d in \$INSTANCE_LIST; do echo "====> Starting \$d:" && /usr/pgsql-11/bin/pg_ctl -D /u01/pgdata/\$d -l /u01/pgdata/\$d/logfile_\$d start && echo ; done
EOF

cat << EOF > ~/scripts/stop_citus.sh
INSTANCE_LIST='master worker1 worker2 worker3'
for d in \$INSTANCE_LIST; do echo "====> Stopping \$d:" && /usr/pgsql-11/bin/pg_ctl -D /u01/pgdata/\$d stop && echo ; done
EOF

cat << EOF > ~/scripts/reload_citus.sh
INSTANCE_LIST='master worker1 worker2 worker3'
for d in \$INSTANCE_LIST; do echo "====> Reloading \$d:" && /usr/pgsql-11/bin/pg_ctl -D /u01/pgdata/\$d reload && echo ; done
EOF

cat << EOF > ~/scripts/restart_citus.sh
INSTANCE_LIST='master worker1 worker2 worker3'
for d in \$INSTANCE_LIST; do echo "====> Restarting \$d:" && /usr/pgsql-11/bin/pg_ctl -D /u01/pgdata/\$d -l /u01/pgdata/\$d/logfile_\$d restart && echo ; done
EOF

chmod 700 ~/scripts/*_citus.sh

2.8 启动Citus集群的各个实例

~/scripts/start_citus.sh
# 检查结果:
ps -ef |grep pg |grep -v grep
ss -tunlp |grep postgres

2.9 在各实例中创建扩展视图

2.9.1 为master实例启用citus和pg_stat_statements视图
psql -p 9000 -c "CREATE EXTENSION citus;"
psql -p 9000 -d template1 -c "CREATE EXTENSION citus;"

psql -p 9000 -c "CREATE EXTENSION pg_stat_statements;"
psql -p 9000 -d template1 -c "CREATE EXTENSION pg_stat_statements;"
2.9.2 为3个worker实例启用citus和pg_stat_statements视图
for i in `seq 1 3`; do psql -p 900$i -c "CREATE EXTENSION citus;"; done
for i in `seq 1 3`; do psql -p 900$i -d template1 -c "CREATE EXTENSION citus;"; done

for i in `seq 1 3`; do psql -p 900$i -c "CREATE EXTENSION pg_stat_statements;"; done
for i in `seq 1 3`; do psql -p 900$i -d template1 -c "CREATE EXTENSION pg_stat_statements;"; done

3 配置Citus的分表功能

3.1 在master节点添加3各worker节点

在master节点中添加三个worker节点,作为后端实例:

psql -p 9000 -c "SELECT * from master_add_node('localhost', 9001);"
psql -p 9000 -c "SELECT * from master_add_node('localhost', 9002);"
psql -p 9000 -c "SELECT * from master_add_node('localhost', 9003);"

# 检查结果:
psql -p 9000 -c "select * from master_get_active_worker_nodes();"

3.2 在master节点创建测试表

本文适用Citus官方提供的三张测试表创建语句来测试:

psql -U postgres -p 9000
CREATE TABLE companies (
    id bigint NOT NULL,
    name text NOT NULL,
    image_url text,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL
);

CREATE TABLE campaigns (
    id bigint NOT NULL,
    company_id bigint NOT NULL,
    name text NOT NULL,
    cost_model text NOT NULL,
    state text NOT NULL,
    monthly_budget bigint,
    blacklisted_site_urls text[],
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL
);

CREATE TABLE ads (
    id bigint NOT NULL,
    company_id bigint NOT NULL,
    campaign_id bigint NOT NULL,
    name text NOT NULL,
    image_url text,
    target_url text,
    impressions_count bigint DEFAULT 0,
    clicks_count bigint DEFAULT 0,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL
);

# 检查结果:
\dt

注意:此时,worker节点是不存在这三张表的。

3.3 为所有测试表添加主键

执行如下操作,为上述三张分区表添加主键,以便启用复制:

ALTER TABLE companies ADD PRIMARY KEY (id);
ALTER TABLE campaigns ADD PRIMARY KEY (id, company_id);
ALTER TABLE ads ADD PRIMARY KEY (id, company_id);

3.4 将所有测试表添加到Citus分布式进程

执行如下操作,将上述三张分区表添加到Citus的进程中,并设置分库键:

SELECT create_distributed_table('companies', 'id');
SELECT create_distributed_table('campaigns', 'company_id');
SELECT create_distributed_table('ads', 'company_id');

注意:此时,worker节点已存在这三张表了。如下所示为worker1节点上的表:

[postgres@pgcitus ~]$ psql -U postgres -p 9001 -c "select relname as TABLE_NAME from pg_class where relname like 'ads_1%' and relkind = 'r';"
 table_name
------------
 ads_102096
 ads_102102
 ads_102072
 ads_102084
 ads_102090
 ads_102078
 ads_102081
 ads_102087
 ads_102075
 ads_102093
 ads_102099
(11 rows)

[postgres@pgcitus ~]$ psql -U postgres -p 9001 -c "select relname as TABLE_NAME from pg_class where relname like 'campaigns_1%' and relkind = 'r';"
    table_name
------------------
 campaigns_102046
 campaigns_102052
 campaigns_102067
 campaigns_102049
 campaigns_102064
 campaigns_102055
 campaigns_102040
 campaigns_102058
 campaigns_102043
 campaigns_102070
 campaigns_102061
(11 rows)

[postgres@pgcitus ~]$ psql -U postgres -p 9001 -c "select relname as TABLE_NAME from pg_class where relname like 'companies_1%' and relkind = 'r';"
    table_name
------------------
 companies_102011
 companies_102026
 companies_102014
 companies_102035
 companies_102008
 companies_102032
 companies_102038
 companies_102029
 companies_102017
 companies_102020
 companies_102023
(11 rows)

可以看到,这三张表的分表均被自动编号。如果你仔细观察,还会发现分表的命名规律,如ads_102072之后为ads_102075ads_102073在worker2上,ads_102074在worker3上。因此分表的命名在三个worker节点上是连续的。
可以在master实例上执行如下命令查看分表id:

postgres=# SELECT * from pg_dist_shard_placement order by shardid, placementid;
shardid | shardstate | shardlength | nodename  | nodeport | placementid
---------+------------+-------------+-----------+----------+-------------
102008 |          1 |           0 | localhost |     9001 |           1
102009 |          1 |           0 | localhost |     9002 |           2
102010 |          1 |           0 | localhost |     9003 |           3
102011 |          1 |           0 | localhost |     9001 |           4
102012 |          1 |           0 | localhost |     9002 |           5
102013 |          1 |           0 | localhost |     9003 |           6
102014 |          1 |           0 | localhost |     9001 |           7
102015 |          1 |           0 | localhost |     9002 |           8
102016 |          1 |           0 | localhost |     9003 |           9
102017 |          1 |           0 | localhost |     9001 |          10
102018 |          1 |           0 | localhost |     9002 |          11
102019 |          1 |           0 | localhost |     9003 |          12
102020 |          1 |           0 | localhost |     9001 |          13
102021 |          1 |           0 | localhost |     9002 |          14
102022 |          1 |           0 | localhost |     9003 |          15
102023 |          1 |           0 | localhost |     9001 |          16
102024 |          1 |           0 | localhost |     9002 |          17
102025 |          1 |           0 | localhost |     9003 |          18
102026 |          1 |           0 | localhost |     9001 |          19
102027 |          1 |           0 | localhost |     9002 |          20
102028 |          1 |           0 | localhost |     9003 |          21
102029 |          1 |           0 | localhost |     9001 |          22
102030 |          1 |           0 | localhost |     9002 |          23
102031 |          1 |           0 | localhost |     9003 |          24
102032 |          1 |           0 | localhost |     9001 |          25
102033 |          1 |           0 | localhost |     9002 |          26
102034 |          1 |           0 | localhost |     9003 |          27
102035 |          1 |           0 | localhost |     9001 |          28
102036 |          1 |           0 | localhost |     9002 |          29
102037 |          1 |           0 | localhost |     9003 |          30
102038 |          1 |           0 | localhost |     9001 |          31
102039 |          1 |           0 | localhost |     9002 |          32
102040 |          1 |           0 | localhost |     9001 |          33
102041 |          1 |           0 | localhost |     9002 |          34
102042 |          1 |           0 | localhost |     9003 |          35
102043 |          1 |           0 | localhost |     9001 |          36
102044 |          1 |           0 | localhost |     9002 |          37
102045 |          1 |           0 | localhost |     9003 |          38
102046 |          1 |           0 | localhost |     9001 |          39
102047 |          1 |           0 | localhost |     9002 |          40
102048 |          1 |           0 | localhost |     9003 |          41
102049 |          1 |           0 | localhost |     9001 |          42
102050 |          1 |           0 | localhost |     9002 |          43
102051 |          1 |           0 | localhost |     9003 |          44
102052 |          1 |           0 | localhost |     9001 |          45
102053 |          1 |           0 | localhost |     9002 |          46
102054 |          1 |           0 | localhost |     9003 |          47
102055 |          1 |           0 | localhost |     9001 |          48
102056 |          1 |           0 | localhost |     9002 |          49
102057 |          1 |           0 | localhost |     9003 |          50
102058 |          1 |           0 | localhost |     9001 |          51
102059 |          1 |           0 | localhost |     9002 |          52
102060 |          1 |           0 | localhost |     9003 |          53
102061 |          1 |           0 | localhost |     9001 |          54
102062 |          1 |           0 | localhost |     9002 |          55
102063 |          1 |           0 | localhost |     9003 |          56
102064 |          1 |           0 | localhost |     9001 |          57
102065 |          1 |           0 | localhost |     9002 |          58
102066 |          1 |           0 | localhost |     9003 |          59
102067 |          1 |           0 | localhost |     9001 |          60
102068 |          1 |           0 | localhost |     9002 |          61
102069 |          1 |           0 | localhost |     9003 |          62
102070 |          1 |           0 | localhost |     9001 |          63
102071 |          1 |           0 | localhost |     9002 |          64
102072 |          1 |           0 | localhost |     9001 |          65
102073 |          1 |           0 | localhost |     9002 |          66
102074 |          1 |           0 | localhost |     9003 |          67
102075 |          1 |           0 | localhost |     9001 |          68
102076 |          1 |           0 | localhost |     9002 |          69
102077 |          1 |           0 | localhost |     9003 |          70
102078 |          1 |           0 | localhost |     9001 |          71
102079 |          1 |           0 | localhost |     9002 |          72
102080 |          1 |           0 | localhost |     9003 |          73
102081 |          1 |           0 | localhost |     9001 |          74
102082 |          1 |           0 | localhost |     9002 |          75
102083 |          1 |           0 | localhost |     9003 |          76
102084 |          1 |           0 | localhost |     9001 |          77
102085 |          1 |           0 | localhost |     9002 |          78
102086 |          1 |           0 | localhost |     9003 |          79
102087 |          1 |           0 | localhost |     9001 |          80
102088 |          1 |           0 | localhost |     9002 |          81
102089 |          1 |           0 | localhost |     9003 |          82
102090 |          1 |           0 | localhost |     9001 |          83
102091 |          1 |           0 | localhost |     9002 |          84
102092 |          1 |           0 | localhost |     9003 |          85
102093 |          1 |           0 | localhost |     9001 |          86
102094 |          1 |           0 | localhost |     9002 |          87
102095 |          1 |           0 | localhost |     9003 |          88
102096 |          1 |           0 | localhost |     9001 |          89
102097 |          1 |           0 | localhost |     9002 |          90
102098 |          1 |           0 | localhost |     9003 |          91
102099 |          1 |           0 | localhost |     9001 |          92
102100 |          1 |           0 | localhost |     9002 |          93
102101 |          1 |           0 | localhost |     9003 |          94
102102 |          1 |           0 | localhost |     9001 |          95
102103 |          1 |           0 | localhost |     9002 |          96
(96 rows)

4 导入验证数据测试

4.1 从Citus官方获取测试表的示例数据

本例Citus官方提供的示例表的测试数据来测试。执行如下操作,获取示例数据:

mkdir ~/citus_data
cd ~/citus_data
curl https://examples.citusdata.com/tutorial/companies.csv > companies.csv
curl https://examples.citusdata.com/tutorial/campaigns.csv > campaigns.csv
curl https://examples.citusdata.com/tutorial/ads.csv > ads.csv

4.2 导入示例数据

cd ~/citus_data
psql -U postgres -p 9000
\copy companies from 'companies.csv' with csv
\copy campaigns from 'campaigns.csv' with csv
\copy ads from 'ads.csv' with csv
# 检查结果:
select count(*) from companies;		# 共100条数据
select count(*) from campaigns;		# 共978条数据
select count(*) from ads;			# 共7364条数据

4.3 检查各worker节点的数据分布情况

分别适用下面的语句查看到三个worker节点的子表名称及其记录数:

[root@pgcitus ~]# su - postgres
[root@pgcitus ~]# for p in 9001 9002 9003; do echo "===> Worker Instance $p: " && psql -U postgres -p $p -c "select relname as TABLE_NAME, reltuples as ROW_COUNT from pg_class where relname like 'ads_10%' and relkind = 'r' order by ROW_COUNT desc;"; done
===> Worker Instance 9001:
 table_name | row_count
------------+-----------
 ads_102075 |       434
 ads_102102 |       407
 ads_102072 |       270
 ads_102084 |       252
 ads_102093 |       219
 ads_102096 |       193
 ads_102081 |       183
 ads_102078 |       134
 ads_102087 |       134
 ads_102099 |       109
 ads_102090 |         0
(11 rows)

===> Worker Instance 9002:
 table_name | row_count
------------+-----------
 ads_102100 |       471
 ads_102103 |       380
 ads_102073 |       306
 ads_102091 |       274
 ads_102094 |       266
 ads_102076 |       221
 ads_102079 |       195
 ads_102088 |       160
 ads_102097 |       134
 ads_102085 |        96
 ads_102082 |        92
(11 rows)

===> Worker Instance 9003:
 table_name | row_count
------------+-----------
 ads_102092 |       502
 ads_102086 |       466
 ads_102080 |       457
 ads_102101 |       245
 ads_102095 |       191
 ads_102083 |       157
 ads_102074 |       128
 ads_102089 |       101
 ads_102077 |        98
 ads_102098 |        89
(10 rows)

同理,下面的语句可以统计companies表和campaigns分表的数据分布情况:

for p in 9001 9002 9003; do echo "===> Worker Instance $p: " && psql -U postgres -p $p -c "select relname as TABLE_NAME, reltuples as ROW_COUNT from pg_class where relname like 'companies_10%' and relkind = 'r' order by ROW_COUNT desc;"; done
for p in 9001 9002 9003; do echo "===> Worker Instance $p: " && psql -U postgres -p $p -c "select relname as TABLE_NAME, reltuples as ROW_COUNT from pg_class where relname like 'campaigns_10%' and relkind = 'r' order by ROW_COUNT desc;"; done

下面的语句分别统计三张测试表在三个worker实例上的所有分表的行数之和:

for p in 9001 9002 9003; do echo "===> Worker Instance $p: " && psql -U postgres -p $p -c "select sum(ROW_COUNT) from (select relname as TABLE_NAME, reltuples as ROW_COUNT from pg_class where relname like 'ads_10%' and relkind = 'r' order by ROW_COUNT desc) as ADS_COUNT;"; done
for p in 9001 9002 9003; do echo "===> Worker Instance $p: " && psql -U postgres -p $p -c "select sum(ROW_COUNT) from (select relname as TABLE_NAME, reltuples as ROW_COUNT from pg_class where relname like 'companies_1%' and relkind = 'r' order by ROW_COUNT desc) as ADS_COUNT;"; done
for p in 9001 9002 9003; do echo "===> Worker Instance $p: " && psql -U postgres -p $p -c "select sum(ROW_COUNT) from (select relname as TABLE_NAME, reltuples as ROW_COUNT from pg_class where relname like 'campaigns_1%' and relkind = 'r' order by ROW_COUNT desc) as ADS_COUNT;"; done

可以看到Citus根据规则将示例数据几乎存储到了三个worker示例上。

### PostgreSQL分库分表的实现方法与最佳实践 #### 1. 分库分表的核心概念 在大规模数据处理场景中,当单个数据库实例无法承载海量数据时,分库分表成为一种常见的解决方案。对于 PostgreSQL 而言,其支持多种分库分表的方式,主要包括垂直分表、垂直分库、水平分表以及水平分库[^3]。 - **垂直分表**:将一张宽表按照字段拆分成多张窄表,每张表只保留部分字段。这种方式适用于某些列访问频率较低的情况。 - **垂直分库**:将不同业务模块的数据存放到不同的数据库中,从而减少单一数据库的压力。 - **水平分表**:基于某种规则(如时间范围或哈希值),将同一张表的数据分布到多个子表中。例如 `product_order_0` 和 `product_order_1` 是典型的水平分表结果[^2]。 - **水平分库**:进一步扩展水平分表的概念,将数据分布在多个独立的数据库实例上。 #### 2. PostgreSQL分库分表工具与技术选型 PostgreSQL 提供了一些内置功能和第三方工具来简化分库分表的过程: - **分区表 (Partitioned Tables)** PostgreSQL 自版本 10 开始原生支持声明式分区表功能。可以通过 RANGE 或 LIST 方式定义分区逻辑。例如,按日期范围创建分区表: ```sql CREATE TABLE orders ( order_id bigint, order_date date, amount numeric ) PARTITION BY RANGE (order_date); CREATE TABLE orders_2023 QPARTITION OF orders FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'); ``` - **逻辑复制 (Logical Replication)** 对于冷热数据分离的需求,可以利用逻辑复制机制,在主节点保存热数据的同时,将历史数据同步至其他节点存储[^4]。 - **中间件方案** 如果需要跨多个 PostgreSQL 实例进行分库分表操作,则可引入中间件完成路由和服务治理工作。常用的工具有: - **pg_shard**:由 Citus Data 推出的一款开源插件,允许开发者轻松构建分布式 PostgreSQL 数据库集群。 - **Citus**:作为 pg_shard 的升级版,提供了更加完善的分布式事务支持及性能调优选项。 #### 3. 设计注意事项 尽管分库分表能够有效缓解高并发读写压力,但也带来了额外复杂度。因此,在实际应用过程中需要注意以下几个方面的问题[^1]: - 数据一致性保障; - SQL 查询效率优化; - 主键冲突规避策略; - 备份恢复流程调整等。 #### 4. 性能优化建议 为了充分发挥 PostgreSQL分库分表环境下的潜力,可以从硬件资源配置、索引设计等方面入手实施全面优化措施。具体包括但不限于以下几点: - 合理规划磁盘 I/O 布局,确保各分片间负载均衡; - 针对频繁使用的查询条件建立覆盖索引(Covering Index),减少回表次数; - 定期执行 VACUUM ANALYZE 维护命令保持统计信息最新状态; ```python import psycopg2 def connect_to_partitioned_db(): conn = psycopg2.connect( dbname="orders", user="postgres", password="password", host="localhost" ) cursor = conn.cursor() query = """ SELECT * FROM orders WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'; """ cursor.execute(query) results = cursor.fetchall() return results ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值