
Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:
- Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
- Postgres-XL的项目发起人Mason Sharp
- pgpool的作者石井达夫(Tatsuo Ishii)
- PG-Strom的作者海外浩平(Kaigai Kohei)
- Greenplum研发总监姚延栋
- 周正中(德哥), PostgreSQL中国用户会创始人之一
- 汪洋,平安科技数据库技术部经理
- ……
|
![]() |
测试环境 :
CentOS release 5.7 (Final)内存 : 96GBCPU : 2 * Intel(R) Xeon(R) CPU E5504 @ 2.00GHz硬盘 : OCZ RevoDrive3
1. 下载操作系统对应的citusdb包
wget http://packages.citusdata.com/readline-5.0/citusdb-2.0.0-1.x86_64.rpm
2. 安装
su - rootrpm -ivh citusdb-2.0.0-1.x86_64.rpm# 看看citusdb安装包的内容[root@db-172-16-3-150 ~]# rpm -qa|grep cituscitusdb-2.0-2.0.0-1# 通过rpm -ql查看citusdb安装包的内容[root@db-172-16-3-150 ~]# rpm -ql citusdb-2.0-2.0.0-1# 从输出可以看到, 文件被解压到 /opt/citusdb
# 新增用户
[root@db-172-16-3-150 ~]# useradd citusdb
# 修改citusdb权限
[root@db-172-16-3-150 opt]# chown -R citusdb:citusdb /opt/citusdb
# 配置用户环境变量
[root@db-172-16-3-150 opt]# su - citusdbcitusdb@db-172-16-3-150-> vi .bash_profileexport PS1="$USER@`/bin/hostname -s`-> "export PGPORT=9900export PGUSER=postgresexport PGDATA=/data06/citusdb/masterexport PGHOST=$PGDATAexport PGDATABASE=digoalexport LANG=en_US.utf8export PGHOME=/opt/citusdb/2.0export LD_LIBRARY_PATH=$PGHOME/lib:$PGHOME/lib/postgresql:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/libexport DATE=`date +"%Y%m%d%H%M"`export PATH=$PGHOME/bin:$PATH:.#export MANPATH=$PGHOME/share/man:$MANPATHexport LD_PRELOAD=/usr/lib64/libncurses.soalias rm='rm -i'alias ll='ls -lh'
#注意这里的LD_LIBRARY_PATH与postgresql 的LD_LIBRARY_PATH路径不一样, 多了一个路径lib/postgresql.
原因如下 :
citusdb@db-172-16-3-150-> cd /opt/citusdb/2.0/lib/citusdb@db-172-16-3-150-> lltotal 984K-rw-r--r-- 1 citusdb citusdb 147K Jan 14 21:04 libecpg.a-rw-r--r-- 1 citusdb citusdb 19K Jan 14 21:04 libecpg_compat.alrwxrwxrwx 1 citusdb citusdb 21 Mar 15 07:54 libecpg_compat.so -> libecpg_compat.so.3.4lrwxrwxrwx 1 citusdb citusdb 21 Mar 15 07:54 libecpg_compat.so.3 -> libecpg_compat.so.3.4-rwxr-xr-x 1 citusdb citusdb 23K Jan 14 21:04 libecpg_compat.so.3.4lrwxrwxrwx 1 citusdb citusdb 14 Mar 15 07:54 libecpg.so -> libecpg.so.6.4lrwxrwxrwx 1 citusdb citusdb 14 Mar 15 07:54 libecpg.so.6 -> libecpg.so.6.4-rwxr-xr-x 1 citusdb citusdb 85K Jan 14 21:04 libecpg.so.6.4-rw-r--r-- 1 citusdb citusdb 65K Jan 14 21:02 libpgport.a-rw-r--r-- 1 citusdb citusdb 99K Jan 14 21:04 libpgtypes.alrwxrwxrwx 1 citusdb citusdb 17 Mar 15 07:54 libpgtypes.so -> libpgtypes.so.3.3lrwxrwxrwx 1 citusdb citusdb 17 Mar 15 07:54 libpgtypes.so.3 -> libpgtypes.so.3.3-rwxr-xr-x 1 citusdb citusdb 68K Jan 14 21:04 libpgtypes.so.3.3-rw-r--r-- 1 citusdb citusdb 262K Jan 14 21:04 libpq.alrwxrwxrwx 1 citusdb citusdb 12 Mar 15 07:54 libpq.so -> libpq.so.5.5lrwxrwxrwx 1 citusdb citusdb 12 Mar 15 07:54 libpq.so.5 -> libpq.so.5.5-rwxr-xr-x 1 citusdb citusdb 172K Jan 14 21:04 libpq.so.5.5drwxr-xr-x 3 citusdb citusdb 4.0K Mar 19 14:16 postgresqlcitusdb@db-172-16-3-150-> cd postgresql/citusdb@db-172-16-3-150-> lltotal 4.4M-rwxr-xr-x 1 citusdb citusdb 7.1K Jan 14 21:04 ascii_and_mic.so-rwxr-xr-x 1 citusdb citusdb 18K Jan 14 21:04 cyrillic_and_mic.so-rwxr-xr-x 1 citusdb citusdb 459K Jan 14 21:04 dict_snowball.so-rwxr-xr-x 1 citusdb citusdb 8.9K Jan 14 21:04 euc2004_sjis2004.so-rwxr-xr-x 1 citusdb citusdb 7.4K Jan 14 21:04 euc_cn_and_mic.so-rwxr-xr-x 1 citusdb citusdb 15K Jan 14 21:04 euc_jp_and_sjis.so-rwxr-xr-x 1 citusdb citusdb 7.5K Jan 14 21:04 euc_kr_and_mic.so-rwxr-xr-x 1 citusdb citusdb 14K Jan 14 21:04 euc_tw_and_big5.so-rwxr-xr-x 1 root root 22K Mar 19 14:16 file_fdw.so-rwxr-xr-x 1 citusdb citusdb 11K Jan 14 21:04 latin2_and_win1250.so-rwxr-xr-x 1 citusdb citusdb 9.3K Jan 14 21:04 latin_and_mic.so-rwxr-xr-x 1 citusdb citusdb 17K Jan 14 21:05 libpqclientexecutor.so-rwxr-xr-x 1 citusdb citusdb 15K Jan 14 21:05 libpqwalreceiver.sodrwxr-xr-x 4 citusdb citusdb 4.0K Mar 15 07:54 pgxs-rwxr-xr-x 1 citusdb citusdb 171K Jan 14 21:05 plpgsql.so-rwxr-xr-x 1 citusdb citusdb 7.1K Jan 14 21:04 utf8_and_ascii.so-rwxr-xr-x 1 citusdb citusdb 224K Jan 14 21:04 utf8_and_big5.so-rwxr-xr-x 1 citusdb citusdb 13K Jan 14 21:04 utf8_and_cyrillic.so-rwxr-xr-x 1 citusdb citusdb 189K Jan 14 21:04 utf8_and_euc2004.so-rwxr-xr-x 1 citusdb citusdb 124K Jan 14 21:04 utf8_and_euc_cn.so-rwxr-xr-x 1 citusdb citusdb 215K Jan 14 21:04 utf8_and_euc_jp.so-rwxr-xr-x 1 citusdb citusdb 136K Jan 14 21:04 utf8_and_euc_kr.so-rwxr-xr-x 1 citusdb citusdb 332K Jan 14 21:04 utf8_and_euc_tw.so-rwxr-xr-x 1 citusdb citusdb 999K Jan 14 21:04 utf8_and_gb18030.so-rwxr-xr-x 1 citusdb citusdb 348K Jan 14 21:04 utf8_and_gbk.so-rwxr-xr-x 1 citusdb citusdb 7.7K Jan 14 21:04 utf8_and_iso8859_1.so-rwxr-xr-x 1 citusdb citusdb 36K Jan 14 21:04 utf8_and_iso8859.so-rwxr-xr-x 1 citusdb citusdb 276K Jan 14 21:04 utf8_and_johab.so-rwxr-xr-x 1 citusdb citusdb 188K Jan 14 21:04 utf8_and_sjis2004.so-rwxr-xr-x 1 citusdb citusdb 128K Jan 14 21:04 utf8_and_sjis.so-rwxr-xr-x 1 citusdb citusdb 279K Jan 14 21:04 utf8_and_uhc.so-rwxr-xr-x 1 citusdb citusdb 31K Jan 14 21:04 utf8_and_win.so
3. 初始化主节点以及work节点数据库
master :
su - root[root@db-172-16-3-150 opt]# mkdir -p /data06/citusdb/master[root@db-172-16-3-150 opt]# chown -R citusdb:citusdb /data06/citusdb[root@db-172-16-3-150 opt]# su - citusdbcitusdb@db-172-16-3-150-> initdb -D /data06/citusdb/master -E UTF8 --locale=C -U postgres -W.. 输入postgres用户密码
worker :
[root@db-172-16-3-150 opt]# mkdir -p /data06/citusdb/work01[root@db-172-16-3-150 opt]# mkdir -p /data06/citusdb/work02[root@db-172-16-3-150 opt]# mkdir -p /data06/citusdb/work03[root@db-172-16-3-150 opt]# mkdir -p /data06/citusdb/work04[root@db-172-16-3-150 opt]# chown -R citusdb:citusdb /data06/citusdb\[root@db-172-16-3-150 opt]# su - citusdbcitusdb@db-172-16-3-150-> initdb -D /data06/citusdb/work01 -E UTF8 --locale=C -U postgres -W.. 输入postgres用户密码citusdb@db-172-16-3-150-> initdb -D /data06/citusdb/work02 -E UTF8 --locale=C -U postgres -W.. 输入postgres用户密码citusdb@db-172-16-3-150-> initdb -D /data06/citusdb/work03 -E UTF8 --locale=C -U postgres -W.. 输入postgres用户密码citusdb@db-172-16-3-150-> initdb -D /data06/citusdb/work04 -E UTF8 --locale=C -U postgres -W.. 输入postgres用户密码
4. 操作系统配置
配置sysctl.conf
[root@db-172-16-3-150 ~]# vi /etc/sysctl.conf# Controls the maximum size of a message, in byteskernel.msgmnb = 65536# Controls the default maxmimum size of a mesage queuekernel.msgmax = 65536# Controls the maximum shared segment size, in byteskernel.shmmax = 68719476736# Controls the maximum number of shared memory segments, in pageskernel.shmall = 4294967296kernel.shmmni = 4096kernel.sem = 50100 64128000 50100 1280fs.file-max = 7672460net.ipv4.ip_local_port_range = 9000 65000net.core.rmem_default = 1048576net.core.rmem_max = 4194304net.core.wmem_default = 262144net.core.wmem_max = 1048576net.ipv4.tcp_tw_recycle = 1net.ipv4.tcp_max_syn_backlog = 4096net.core.netdev_max_backlog = 10000net.ipv4.ip_conntrack_max = 655360fs.aio-max-nr = 1048576net.ipv4.tcp_timestamps = 0vm.overcommit_memory = 0[root@db-172-16-3-150 ~]# sysctl -p
配置
/etc/security/limits.conf
[root@db-172-16-3-150 ~]# vi /etc/security/limits.conf* soft nofile 131072* hard nofile 131072* soft nproc 131072* hard nproc 131072* soft core unlimited* hard core unlimited* soft memlock 50000000* hard memlock 50000000
# 配置模拟work数据库的IP
[root@db-172-16-3-150 opt]# ifconfig lo:1 1.1.1.1/32[root@db-172-16-3-150 opt]# ifconfig lo:2 1.1.1.2/32[root@db-172-16-3-150 opt]# ifconfig lo:3 1.1.1.3/32[root@db-172-16-3-150 opt]# ifconfig lo:4 1.1.1.4/32[root@db-172-16-3-150 opt]# ip route list table localbroadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1broadcast 172.16.3.255 dev eth0 proto kernel scope link src 172.16.3.150local 1.1.1.2 dev lo proto kernel scope host src 1.1.1.1local 1.1.1.3 dev lo proto kernel scope host src 1.1.1.1local 1.1.1.1 dev lo proto kernel scope host src 1.1.1.1broadcast 172.16.3.0 dev eth0 proto kernel scope link src 172.16.3.150local 1.1.1.4 dev lo proto kernel scope host src 1.1.1.1broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1local 172.16.3.150 dev eth0 proto kernel scope host src 172.16.3.150local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
# 由于我这里没有使用DNS, 所以需要配置/etc/hosts
[root@db-172-16-3-150 ~]# vi /etc/hosts# 注意如果没有DNS则需要配置/etc/hosts1.1.1.1 work011.1.1.2 work021.1.1.3 work031.1.1.4 work04
5. 数据库配置
master :
citusdb@db-172-16-3-150-> cd /data06/citusdb/mastercitusdb@db-172-16-3-150-> vi postgresql.conflisten_addresses = '0.0.0.0'port = 9900unix_socket_directory = '.'unix_socket_permissions = 0700tcp_keepalives_idle = 60tcp_keepalives_interval = 10tcp_keepalives_count = 10shared_buffers = 2048MBmaintenance_work_mem = 1024MBmax_stack_depth = 8MBsynchronous_commit = offwal_buffers = 16384kBwal_writer_delay = 10mscheckpoint_segments = 32random_page_cost = 1.0effective_cache_size = 81920MBlog_destination = 'csvlog'logging_collector = onlog_directory = 'pg_log'log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'log_file_mode = 0600log_truncate_on_rotation = onlog_rotation_age = 1dlog_rotation_size = 10MBlog_checkpoints = onlog_connections = onlog_disconnections = onlog_error_verbosity = verboseautovacuum = onlog_autovacuum_min_duration = 0
citusdb@db-172-16-3-150-> vi pg_hba.confhost all all 1.1.1.0/8 trusthost all all 0.0.0.0/0 md5
citusdb@db-172-16-3-150-> vi pg_worker_list.confwork01 9901work02 9902work03 9903work04 9904
worker :
citusdb@db-172-16-3-150-> vi /data06/citusdb/work01/postgresql.confport = 9901其他项同master的配置.citusdb@db-172-16-3-150-> vi /data06/citusdb/work02/postgresql.confport = 9902其他项同master的配置.citusdb@db-172-16-3-150-> vi /data06/citusdb/work03/postgresql.confport = 9903其他项同master的配置.citusdb@db-172-16-3-150-> vi /data06/citusdb/work04/postgresql.confport = 9904其他项同master的配置.
citusdb@db-172-16-3-150-> vi /data06/citusdb/work01/pg_hba.confcitusdb@db-172-16-3-150-> vi /data06/citusdb/work02/pg_hba.confcitusdb@db-172-16-3-150-> vi /data06/citusdb/work03/pg_hba.confcitusdb@db-172-16-3-150-> vi /data06/citusdb/work04/pg_hba.confhost all all 1.1.1.0/32 trusthost all all 0.0.0.0/0 md5
6. 启动数据库
master :
citusdb@db-172-16-3-150-> pg_ctl start -D /data06/citusdb/master
worker :
citusdb@db-172-16-3-150-> pg_ctl start -D /data06/citusdb/work01citusdb@db-172-16-3-150-> pg_ctl start -D /data06/citusdb/work02citusdb@db-172-16-3-150-> pg_ctl start -D /data06/citusdb/work03citusdb@db-172-16-3-150-> pg_ctl start -D /data06/citusdb/work04
7. 使用
# 连接到主节点, 以及所有work节点, 创建测试数据库, schema
citusdb@db-172-16-3-150-> psql -h 127.0.0.1 -p 9900 -U postgres postgrescitusdb@db-172-16-3-150-> psql -h 127.0.0.1 -p 9901 -U postgres postgrescitusdb@db-172-16-3-150-> psql -h 127.0.0.1 -p 9902 -U postgres postgrescitusdb@db-172-16-3-150-> psql -h 127.0.0.1 -p 9903 -U postgres postgrescitusdb@db-172-16-3-150-> psql -h 127.0.0.1 -p 9904 -U postgres postgrescreate role digoal nosuperuser login encrypted password 'digoal';create database digoal with owner digoal;\c digoal digoal# 注意目前citusdb对schema的支持有BUG, 会造成无法查询数据, 后面有例子.create schema digoal;
# 在主节点创建测试表
citusdb@db-172-16-3-150-> psql -h 127.0.0.1 -p 9900 -U postgres postgrespsql (9.2.1)Type "help" for help.digoal=# \c digoal digoalYou are now connected to database "digoal" as user "digoal".digoal=> CREATE TABLE customer_reviews(customer_id TEXT not null,review_date DATE not null,review_rating INTEGER not null,review_votes INTEGER,review_helpful_votes INTEGER,product_id CHAR(10) not null,product_title TEXT not null,product_sales_rank BIGINT,product_group TEXT,product_category TEXT,product_subcategory TEXT,similar_product_ids CHAR(10)[])DISTRIBUTE BY APPEND (review_date);digoal=> CREATE INDEX customer_id_index ON customer_reviews (customer_id);CREATE INDEX
8.
在主节点
下载测试数据
wget http://examples.citusdata.com/customer_reviews_1999.csv.gzwget http://examples.citusdata.com/customer_reviews_1998.csv.gzgunzip customer_reviews_1998.csv.gzgunzip customer_reviews_1999.csv.gz
9.
在主节点
导入数据, 注意目前往citusdb中导入数据必须从work节点使用psql连接到master节点.
导入distributed 表数据用的是SATGE命令, 和COPY命令的语法类似.
citusdb@db-172-16-3-150-> psql -h 1.1.1.1 -p 9900 digoal postgrespsql (9.2.1)Type "help" for help.digoal=# digoal=# \STAGE digoal.customer_reviews FROM '/home/citusdb/customer_reviews_1998.csv' (FORMAT CSV);digoal=# select * from pg_dist_partition ;logicalrelid | partmethod | partkey--------------+------------+---------------------------------------------------------------------------------------------------------------------------16467 | a | {VAR :varno 1 :varattno 2 :vartype 1082 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 2 :location 428}(1 row)# 查看主节点数据库中的shard信息digoal=# select * from pg_dist_shard;logicalrelid | shardid | shardstorage | shardalias | shardminvalue | shardmaxvalue--------------+---------+--------------+------------+---------------+---------------16467 | 102010 | t | | 1970-12-30 | 1998-12-31(1 row)
digoal=# select * from pg_dist_shard_placement;shardid | shardstate | shardlength | nodename | nodeport---------+------------+-------------+----------+----------102010 | 1 | 145932288 | work01 | 9901102010 | 1 | 145932288 | work04 | 9904(2 rows)
# worker节点中不存储shard信息 :
digoal=# select * from pg_dist_shard;logicalrelid | shardid | shardstorage | shardalias | shardminvalue | shardmaxvalue--------------+---------+--------------+------------+---------------+---------------(0 rows)
digoal=# select * from pg_dist_shard_placement;shardid | shardstate | shardlength | nodename | nodeport---------+------------+-------------+----------+----------(0 rows)
digoal=# select * from pg_dist_partition ;logicalrelid | partmethod | partkey--------------+------------+---------(0 rows)
# 如果不是从worker节点通过psql连接到master节点, 将会报错, 错误信息 :
citusdb@db-172-16-3-150-> psql -h 172.16.3.150 digoal postgresPassword for user postgres:psql (9.2.1)Type "help" for help.digoal=# \STAGE digoal.customer_reviews FROM '/home/citusdb/customer_reviews_1998.csv' (FORMAT CSV);remote command "SELECT * FROM master_get_candidate_nodes()" failed with ERROR: could not find worker node for hostname: db-172-16-3-150.sky-mobi.com
citusdb@db-172-16-3-150-> psql -h 127.0.0.1 digoal postgrespsql (9.2.1)Type "help" for help.digoal=# \STAGE digoal.customer_reviews FROM '/home/citusdb/customer_reviews_1998.csv' (FORMAT CSV);remote command "SELECT * FROM master_get_candidate_nodes()" failed with ERROR: could not find worker node for hostname: localhost.localdomain
# 注释/etc/hosts中对应的主机# 127.0.0.1 localhost.localdomain localhost# 172.16.3.150 db-172-16-3-150.sky-mobi.com db-172-16-3-1501.1.1.1 work011.1.1.2 work021.1.1.3 work031.1.1.4 work04
citusdb@db-172-16-3-150-> psql -h 127.0.0.1 -p 9900 digoal postgrespsql (9.2.1)Type "help" for help.digoal=# \STAGE digoal.customer_reviews FROM '/home/citusdb/customer_reviews_1998.csv' (FORMAT CSV);remote command "SELECT * FROM master_get_candidate_nodes()" failed with ERROR: could not find worker node for hostname: localhost
# 解决以上错误的方法, 注意这句话
CitusDB currently requires that you log into one of the worker nodes to stage data, and connect to the master node from the worker node using psql.
# 例如, 连接1.1.1.1:9900端口, 查看inet_client_addr.digoal=# select inet_client_addr();inet_client_addr------------------1.1.1.1(1 row)虽然work,master在同一台主机上, 但是这个场景确实是从work01节点去连master节点的, 为什么呢?1. master节点的 /etc/hosts中有1.1.1.1 work012. master的 pg_worker_list.conf中有work01 99013. 所以可以执行\STAGE导入数据.因此多主机的环境中也要满足这种需求才行.
为什么需要从worker节点连到master节点去执行导入数据的动作呢?
原因参见 :
# 继续导入另一个文件
citusdb@db-172-16-3-150-> psql -h 1.1.1.2 -p 9900 -U postgres digoalpsql (9.2.1)Type "help" for help.digoal=# \STAGE digoal.customer_reviews FROM '/home/citusdb/customer_reviews_1999.csv' (FORMAT CSV);digoal=# select * from pg_dist_partition ;logicalrelid | partmethod | partkey--------------+------------+---------------------------------------------------------------------------------------------------------------------------16467 | a | {VAR :varno 1 :varattno 2 :vartype 1082 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 2 :location 428}(1 row)
digoal=# select * from pg_dist_shard;logicalrelid | shardid | shardstorage | shardalias | shardminvalue | shardmaxvalue--------------+---------+--------------+------------+---------------+---------------16467 | 102010 | t | | 1970-12-30 | 1998-12-3116467 | 102014 | t | | 1999-01-01 | 1999-12-31(2 rows)
digoal=# select * from pg_dist_shard_placement;shardid | shardstate | shardlength | nodename | nodeport---------+------------+-------------+----------+----------102010 | 1 | 145932288 | work01 | 9901102010 | 1 | 145932288 | work04 | 9904102014 | 1 | 286695424 | work02 | 9902102014 | 1 | 286695424 | work03 | 9903(4 rows)
10.
在主节点
查询表数据
# 由于目前citusDB对schema支持有问题, 会有如下错误 :
digoal=# \set VERBOSITY verbosedigoal=# select count(*) from digoal.customer_reviews ;WARNING: 01000: could not receive query resultsDETAIL: Client result status: PGRES_FATAL_ERRORLOCATION: ClientQueryStatus, libpqclientexecutor.c:460WARNING: 01000: could not receive query resultsDETAIL: Client result status: PGRES_FATAL_ERRORLOCATION: ClientQueryStatus, libpqclientexecutor.c:460WARNING: 01000: could not receive query resultsDETAIL: Client result status: PGRES_FATAL_ERRORLOCATION: ClientQueryStatus, libpqclientexecutor.c:460WARNING: 01000: could not receive query resultsDETAIL: Client result status: PGRES_FATAL_ERRORLOCATION: ClientQueryStatus, libpqclientexecutor.c:460WARNING: 01000: could not receive query resultsDETAIL: Client result status: PGRES_FATAL_ERRORLOCATION: ClientQueryStatus, libpqclientexecutor.c:460WARNING: 01000: could not receive query resultsDETAIL: Client result status: PGRES_FATAL_ERRORLOCATION: ClientQueryStatus, libpqclientexecutor.c:460ERROR: XX000: failed to execute job 5DETAIL: Failure due to failed task 1LOCATION: MultiServerExecute, multi_server_executor.c:181digoal=# \q
# 目前CitusDB貌似不支持schema
# 在public中创建则没有以上问题
# 连接到所有节点, 删除digoal schema
# 在主节点的public下面重新新建表
digoal=# CREATE TABLE customer_reviews(customer_id TEXT not null,review_date DATE not null,review_rating INTEGER not null,review_votes INTEGER,review_helpful_votes INTEGER,product_id CHAR(10) not null,product_title TEXT not null,product_sales_rank BIGINT,product_group TEXT,product_category TEXT,product_subcategory TEXT,similar_product_ids CHAR(10)[])DISTRIBUTE BY APPEND (review_date);digoal=# CREATE INDEX customer_id_index ON customer_reviews (customer_id);
# 在任意worker节点通过psql连接到master节点, 导入数据 :
digoal=# \STAGE customer_reviews FROM '/home/citusdb/customer_reviews_1998.csv' (FORMAT CSV);digoal=# select count(*) from customer_reviews ;count--------589859(1 row)digoal=# explain analyze verbose select count(*) from customer_reviews ;QUERY PLAN----------------------------------------------------------------------explain statements for distributed queries are currently unsupported(1 row)digoal=# \STAGE customer_reviews FROM '/home/citusdb/customer_reviews_1999.csv' (FORMAT CSV);digoal=# select count(*) from customer_reviews ;count---------1762504(1 row)
# 其他查询测试
digoal=# SELECTwidth_bucket(length(product_title), 1, 50, 5) title_length_bucket,round(avg(review_rating), 2) AS review_average,count(*)FROMcustomer_reviewsWHEREproduct_group = 'Book'GROUP BYtitle_length_bucketORDER BYtitle_length_bucket;title_length_bucket | review_average | count---------------------+----------------+--------1 | 4.26 | 1390342 | 4.24 | 4113183 | 4.34 | 2456714 | 4.32 | 1673615 | 4.30 | 1184226 | 4.40 | 116412(6 rows)digoal=# SELECTcustomer_id, review_date, review_rating, product_id, product_titleFROMcustomer_reviewsWHEREcustomer_id ='A27T7HVDXA3K2A' ANDproduct_title LIKE '%Dune%' ANDreview_date >= '1998-01-01' ANDreview_date <= '1998-12-31';customer_id | review_date | review_rating | product_id | product_title----------------+-------------+---------------+------------+-----------------------------------------------A27T7HVDXA3K2A | 1998-04-10 | 5 | 1559949570 | Dune Audio CollectionA27T7HVDXA3K2A | 1998-04-10 | 5 | 0881036366 | Dune (Dune Chronicles (Econo-Clad Hardcover))A27T7HVDXA3K2A | 1998-04-10 | 5 | 0441172717 | Dune (Dune Chronicles, Book 1)A27T7HVDXA3K2A | 1998-04-10 | 5 | 044100590X | DuneA27T7HVDXA3K2A | 1998-04-10 | 5 | 0399128964 | Dune (Dune Chronicles (Econo-Clad Hardcover))(5 rows)
1. 数据导入注意事项, 导入数据必须从work节点去连接master节点. 否则会报异常.
2. schema 问题, 目前在CitusDB中使用public以外的schema将会造成查询异常. 所以现在建议使用public.
3. CitusDB新增的相关参数见本文参考部分.
【参考】
3. postgresql.conf中citusdb相关部分 :
#------------------------------------------------------------------------------# DISTRIBUTED DATABASE#------------------------------------------------------------------------------
#max_worker_nodes_tracked = 2048 # maximum worker nodes that are tracked# (change requires restart)
#shard_replication_factor = 2 # number of replicas to maintain, 多少份拷贝#shard_max_size = 2GB # maximum size a shard will grow, 达到多大后开始shard# (stored permanently in system catalogs)
#remote_task_check_interval = 100ms # 10-100000 milliseconds between checks#task_tracker_active = off # starts up the task tracker process# (change requires restart)#task_tracker_delay = 200ms # 10-100000 milliseconds between rounds#max_tracked_tasks_per_node = 256 # maximum tasks that are tracked per node# (change requires restart)#max_running_tasks_per_node = 8 # maximum tasks to run concurrently
#partition_buffer_size = 8MB # per process repartitioning buffer size#large_table_shard_count = 4 # table considered large after this size#limit_clause_row_fetch_count = -1 # number of rows to fetch from each task#task_assignment_policy = 'greedy' # task assignment policy to use
4. distributed表在主节点的INSERT, UPDATE, DELETE语句不支持.
digoal=# update customer_reviews set review_rating=review_rating where customer_id='AE22YDHSBFYIP';ERROR: 0A000: cannot execute UPDATE on a distributed table on master nodeLOCATION: PreventCommandIfMasterTable, utility.c:360
digoal=# insert into customer_reviews select * from customer_reviews limit 1;ERROR: 0A000: cannot execute INSERT on a distributed table on master nodeLOCATION: PreventCommandIfMasterTable, utility.c:360
digoal=# delete from customer_reviews ;ERROR: 0A000: cannot execute DELETE on a distributed table on master nodeLOCATION: PreventCommandIfMasterTable, utility.c:360
普通表的DML是支持的 :
digoal=# create table test(id int);CREATE TABLEdigoal=# insert into test values (1);INSERT 0 1