CitusDB, PostgreSQLs Use Hadoop Distribute Query - 1 : Single HOST CitusDB Cluster install

Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:

  • Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
  • Postgres-XL的项目发起人Mason Sharp
  • pgpool的作者石井达夫(Tatsuo Ishii)
  • PG-Strom的作者海外浩平(Kaigai Kohei)
  • Greenplum研发总监姚延栋
  • 周正中(德哥), PostgreSQL中国用户会创始人之一
  • 汪洋,平安科技数据库技术部经理
  • ……


 
  • 2015年度PG大象会报名地址:http://postgres2015.eventdove.com/
  • PostgreSQL中国社区: http://postgres.cn/
  • PostgreSQL专业1群: 3336901(已满)
  • PostgreSQL专业2群: 100910388
  • PostgreSQL专业3群: 150657323


测试环境 : 

CentOS release 5.7 (Final)
内存 : 96GB
CPU : 2 * Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
硬盘 : OCZ RevoDrive3


1. 下载操作系统对应的citusdb包

wget http://packages.citusdata.com/readline-5.0/citusdb-2.0.0-1.x86_64.rpm


2. 安装

su - root
rpm -ivh citusdb-2.0.0-1.x86_64.rpm
# 看看citusdb安装包的内容
[root@db-172-16-3-150 ~]# rpm -qa|grep citus
citusdb-2.0-2.0.0-1
# 通过rpm -ql查看citusdb安装包的内容
[root@db-172-16-3-150 ~]# rpm -ql citusdb-2.0-2.0.0-1
# 从输出可以看到, 文件被解压到 /opt/citusdb

# 新增用户

[root@db-172-16-3-150 ~]# useradd citusdb

# 修改citusdb权限

[root@db-172-16-3-150 opt]# chown -R citusdb:citusdb /opt/citusdb

# 配置用户环境变量

[root@db-172-16-3-150 opt]# su - citusdb
citusdb@db-172-16-3-150-> vi .bash_profile 
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=9900
export PGUSER=postgres
export PGDATA=/data06/citusdb/master
export PGHOST=$PGDATA
export PGDATABASE=digoal
export LANG=en_US.utf8
export PGHOME=/opt/citusdb/2.0
export LD_LIBRARY_PATH=$PGHOME/lib:$PGHOME/lib/postgresql:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
#export MANPATH=$PGHOME/share/man:$MANPATH
export LD_PRELOAD=/usr/lib64/libncurses.so
alias rm='rm -i'
alias ll='ls -lh'

#注意这里的LD_LIBRARY_PATH与postgresql 的LD_LIBRARY_PATH路径不一样, 多了一个路径lib/postgresql.
原因如下 : 

citusdb@db-172-16-3-150-> cd /opt/citusdb/2.0/lib/
citusdb@db-172-16-3-150-> ll
total 984K
-rw-r--r-- 1 citusdb citusdb 147K Jan 14 21:04 libecpg.a
-rw-r--r-- 1 citusdb citusdb  19K Jan 14 21:04 libecpg_compat.a
lrwxrwxrwx 1 citusdb citusdb   21 Mar 15 07:54 libecpg_compat.so -> libecpg_compat.so.3.4
lrwxrwxrwx 1 citusdb citusdb   21 Mar 15 07:54 libecpg_compat.so.3 -> libecpg_compat.so.3.4
-rwxr-xr-x 1 citusdb citusdb  23K Jan 14 21:04 libecpg_compat.so.3.4
lrwxrwxrwx 1 citusdb citusdb   14 Mar 15 07:54 libecpg.so -> libecpg.so.6.4
lrwxrwxrwx 1 citusdb citusdb   14 Mar 15 07:54 libecpg.so.6 -> libecpg.so.6.4
-rwxr-xr-x 1 citusdb citusdb  85K Jan 14 21:04 libecpg.so.6.4
-rw-r--r-- 1 citusdb citusdb  65K Jan 14 21:02 libpgport.a
-rw-r--r-- 1 citusdb citusdb  99K Jan 14 21:04 libpgtypes.a
lrwxrwxrwx 1 citusdb citusdb   17 Mar 15 07:54 libpgtypes.so -> libpgtypes.so.3.3
lrwxrwxrwx 1 citusdb citusdb   17 Mar 15 07:54 libpgtypes.so.3 -> libpgtypes.so.3.3
-rwxr-xr-x 1 citusdb citusdb  68K Jan 14 21:04 libpgtypes.so.3.3
-rw-r--r-- 1 citusdb citusdb 262K Jan 14 21:04 libpq.a
lrwxrwxrwx 1 citusdb citusdb   12 Mar 15 07:54 libpq.so -> libpq.so.5.5
lrwxrwxrwx 1 citusdb citusdb   12 Mar 15 07:54 libpq.so.5 -> libpq.so.5.5
-rwxr-xr-x 1 citusdb citusdb 172K Jan 14 21:04 libpq.so.5.5
drwxr-xr-x 3 citusdb citusdb 4.0K Mar 19 14:16 postgresql
citusdb@db-172-16-3-150-> cd postgresql/
citusdb@db-172-16-3-150-> ll
total 4.4M
-rwxr-xr-x 1 citusdb citusdb 7.1K Jan 14 21:04 ascii_and_mic.so
-rwxr-xr-x 1 citusdb citusdb  18K Jan 14 21:04 cyrillic_and_mic.so
-rwxr-xr-x 1 citusdb citusdb 459K Jan 14 21:04 dict_snowball.so
-rwxr-xr-x 1 citusdb citusdb 8.9K Jan 14 21:04 euc2004_sjis2004.so
-rwxr-xr-x 1 citusdb citusdb 7.4K Jan 14 21:04 euc_cn_and_mic.so
-rwxr-xr-x 1 citusdb citusdb  15K Jan 14 21:04 euc_jp_and_sjis.so
-rwxr-xr-x 1 citusdb citusdb 7.5K Jan 14 21:04 euc_kr_and_mic.so
-rwxr-xr-x 1 citusdb citusdb  14K Jan 14 21:04 euc_tw_and_big5.so
-rwxr-xr-x 1 root    root     22K Mar 19 14:16 file_fdw.so
-rwxr-xr-x 1 citusdb citusdb  11K Jan 14 21:04 latin2_and_win1250.so
-rwxr-xr-x 1 citusdb citusdb 9.3K Jan 14 21:04 latin_and_mic.so
-rwxr-xr-x 1 citusdb citusdb  17K Jan 14 21:05 libpqclientexecutor.so
-rwxr-xr-x 1 citusdb citusdb  15K Jan 14 21:05 libpqwalreceiver.so
drwxr-xr-x 4 citusdb citusdb 4.0K Mar 15 07:54 pgxs
-rwxr-xr-x 1 citusdb citusdb 171K Jan 14 21:05 plpgsql.so
-rwxr-xr-x 1 citusdb citusdb 7.1K Jan 14 21:04 utf8_and_ascii.so
-rwxr-xr-x 1 citusdb citusdb 224K Jan 14 21:04 utf8_and_big5.so
-rwxr-xr-x 1 citusdb citusdb  13K Jan 14 21:04 utf8_and_cyrillic.so
-rwxr-xr-x 1 citusdb citusdb 189K Jan 14 21:04 utf8_and_euc2004.so
-rwxr-xr-x 1 citusdb citusdb 124K Jan 14 21:04 utf8_and_euc_cn.so
-rwxr-xr-x 1 citusdb citusdb 215K Jan 14 21:04 utf8_and_euc_jp.so
-rwxr-xr-x 1 citusdb citusdb 136K Jan 14 21:04 utf8_and_euc_kr.so
-rwxr-xr-x 1 citusdb citusdb 332K Jan 14 21:04 utf8_and_euc_tw.so
-rwxr-xr-x 1 citusdb citusdb 999K Jan 14 21:04 utf8_and_gb18030.so
-rwxr-xr-x 1 citusdb citusdb 348K Jan 14 21:04 utf8_and_gbk.so
-rwxr-xr-x 1 citusdb citusdb 7.7K Jan 14 21:04 utf8_and_iso8859_1.so
-rwxr-xr-x 1 citusdb citusdb  36K Jan 14 21:04 utf8_and_iso8859.so
-rwxr-xr-x 1 citusdb citusdb 276K Jan 14 21:04 utf8_and_johab.so
-rwxr-xr-x 1 citusdb citusdb 188K Jan 14 21:04 utf8_and_sjis2004.so
-rwxr-xr-x 1 citusdb citusdb 128K Jan 14 21:04 utf8_and_sjis.so
-rwxr-xr-x 1 citusdb citusdb 279K Jan 14 21:04 utf8_and_uhc.so
-rwxr-xr-x 1 citusdb citusdb  31K Jan 14 21:04 utf8_and_win.so


3. 初始化主节点以及work节点数据库
master : 

su - root
[root@db-172-16-3-150 opt]# mkdir -p /data06/citusdb/master
[root@db-172-16-3-150 opt]# chown -R citusdb:citusdb /data06/citusdb
[root@db-172-16-3-150 opt]# su - citusdb
citusdb@db-172-16-3-150-> initdb -D /data06/citusdb/master -E UTF8 --locale=C -U postgres -W
.. 输入postgres用户密码
worker : 

[root@db-172-16-3-150 opt]# mkdir -p /data06/citusdb/work01
[root@db-172-16-3-150 opt]# mkdir -p /data06/citusdb/work02
[root@db-172-16-3-150 opt]# mkdir -p /data06/citusdb/work03
[root@db-172-16-3-150 opt]# mkdir -p /data06/citusdb/work04
[root@db-172-16-3-150 opt]# chown -R citusdb:citusdb /data06/citusdb\
[root@db-172-16-3-150 opt]# su - citusdb
citusdb@db-172-16-3-150-> initdb -D /data06/citusdb/work01 -E UTF8 --locale=C -U postgres -W
.. 输入postgres用户密码
citusdb@db-172-16-3-150-> initdb -D /data06/citusdb/work02 -E UTF8 --locale=C -U postgres -W
.. 输入postgres用户密码
citusdb@db-172-16-3-150-> initdb -D /data06/citusdb/work03 -E UTF8 --locale=C -U postgres -W
.. 输入postgres用户密码
citusdb@db-172-16-3-150-> initdb -D /data06/citusdb/work04 -E UTF8 --locale=C -U postgres -W
.. 输入postgres用户密码


4. 操作系统配置
配置sysctl.conf

[root@db-172-16-3-150 ~]# vi /etc/sysctl.conf
# Controls the maximum size of a message, in bytes
kernel.msgmnb = 65536
# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 65536
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296
kernel.shmmni = 4096
kernel.sem = 50100 64128000 50100 1280
fs.file-max = 7672460
net.ipv4.ip_local_port_range = 9000 65000
net.core.rmem_default = 1048576
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.core.netdev_max_backlog = 10000
net.ipv4.ip_conntrack_max = 655360
fs.aio-max-nr = 1048576
net.ipv4.tcp_timestamps = 0
vm.overcommit_memory = 0
[root@db-172-16-3-150 ~]# sysctl -p

配置 /etc/security/limits.conf

[root@db-172-16-3-150 ~]# vi /etc/security/limits.conf
* soft    nofile  131072
* hard    nofile  131072
* soft    nproc   131072
* hard    nproc   131072
* soft    core    unlimited
* hard    core    unlimited
* soft    memlock 50000000
* hard    memlock 50000000

# 配置模拟work数据库的IP

[root@db-172-16-3-150 opt]# ifconfig lo:1 1.1.1.1/32
[root@db-172-16-3-150 opt]# ifconfig lo:2 1.1.1.2/32
[root@db-172-16-3-150 opt]# ifconfig lo:3 1.1.1.3/32
[root@db-172-16-3-150 opt]# ifconfig lo:4 1.1.1.4/32
[root@db-172-16-3-150 opt]# ip route list table local
broadcast 127.255.255.255 dev lo  proto kernel  scope link  src 127.0.0.1 
broadcast 172.16.3.255 dev eth0  proto kernel  scope link  src 172.16.3.150 
local 1.1.1.2 dev lo  proto kernel  scope host  src 1.1.1.1 
local 1.1.1.3 dev lo  proto kernel  scope host  src 1.1.1.1 
local 1.1.1.1 dev lo  proto kernel  scope host  src 1.1.1.1 
broadcast 172.16.3.0 dev eth0  proto kernel  scope link  src 172.16.3.150 
local 1.1.1.4 dev lo  proto kernel  scope host  src 1.1.1.1 
broadcast 127.0.0.0 dev lo  proto kernel  scope link  src 127.0.0.1 
local 172.16.3.150 dev eth0  proto kernel  scope host  src 172.16.3.150 
local 127.0.0.1 dev lo  proto kernel  scope host  src 127.0.0.1 
local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1

# 由于我这里没有使用DNS, 所以需要配置/etc/hosts

[root@db-172-16-3-150 ~]# vi /etc/hosts
# 注意如果没有DNS则需要配置/etc/hosts
1.1.1.1    work01
1.1.1.2    work02
1.1.1.3    work03
1.1.1.4    work04


5. 数据库配置
master : 

citusdb@db-172-16-3-150-> cd /data06/citusdb/master
citusdb@db-172-16-3-150-> vi postgresql.conf
listen_addresses = '0.0.0.0'
port = 9900
unix_socket_directory = '.'
unix_socket_permissions = 0700
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 2048MB
maintenance_work_mem = 1024MB
max_stack_depth = 8MB
synchronous_commit = off
wal_buffers = 16384kB
wal_writer_delay = 10ms
checkpoint_segments = 32
random_page_cost = 1.0
effective_cache_size = 81920MB
log_destination = 'csvlog'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 10MB
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
autovacuum = on
log_autovacuum_min_duration = 0

citusdb@db-172-16-3-150-> vi pg_hba.conf
host all all 1.1.1.0/8 trust
host all all 0.0.0.0/0 md5

citusdb@db-172-16-3-150-> vi pg_worker_list.conf
work01 9901
work02 9902
work03 9903
work04 9904

worker : 

citusdb@db-172-16-3-150-> vi /data06/citusdb/work01/postgresql.conf
port = 9901
其他项同master的配置.
citusdb@db-172-16-3-150-> vi /data06/citusdb/work02/postgresql.conf
port = 9902
其他项同master的配置.
citusdb@db-172-16-3-150-> vi /data06/citusdb/work03/postgresql.conf
port = 9903
其他项同master的配置.
citusdb@db-172-16-3-150-> vi /data06/citusdb/work04/postgresql.conf
port = 9904
其他项同master的配置.

citusdb@db-172-16-3-150-> vi /data06/citusdb/work01/pg_hba.conf
citusdb@db-172-16-3-150-> vi /data06/citusdb/work02/pg_hba.conf
citusdb@db-172-16-3-150-> vi /data06/citusdb/work03/pg_hba.conf
citusdb@db-172-16-3-150-> vi /data06/citusdb/work04/pg_hba.conf
host all all 1.1.1.0/32 trust
host all all 0.0.0.0/0 md5


6. 启动数据库
master : 

citusdb@db-172-16-3-150-> pg_ctl start -D /data06/citusdb/master

worker : 

citusdb@db-172-16-3-150-> pg_ctl start -D /data06/citusdb/work01
citusdb@db-172-16-3-150-> pg_ctl start -D /data06/citusdb/work02
citusdb@db-172-16-3-150-> pg_ctl start -D /data06/citusdb/work03
citusdb@db-172-16-3-150-> pg_ctl start -D /data06/citusdb/work04


7. 使用
# 连接到主节点, 以及所有work节点, 创建测试数据库, schema

citusdb@db-172-16-3-150-> psql -127.0.0.1 -p 9900 -U postgres postgres
citusdb@db-172-16-3-150-> psql -h 127.0.0.1 -p 9901 -U postgres postgres
citusdb@db-172-16-3-150-> psql -127.0.0.1 -p 9902 -U postgres postgres
citusdb@db-172-16-3-150-> psql -127.0.0.1 -p 9903 -U postgres postgres
citusdb@db-172-16-3-150-> psql -127.0.0.1 -p 9904 -U postgres postgres
create role digoal nosuperuser login encrypted password 'digoal';
create database digoal with owner digoal;
\c digoal digoal
# 注意目前citusdb对schema的支持有BUG, 会造成无法查询数据, 后面有例子. 
create schema digoal;

# 在主节点创建测试表

citusdb@db-172-16-3-150-> psql -h 127.0.0.1 -p 9900 -U postgres postgres
psql (9.2.1)
Type "help" for help.
digoal=# \c digoal digoal
You are now connected to database "digoal" as user "digoal".
digoal=> CREATE TABLE customer_reviews                                    
(
    customer_id TEXT not null,
    review_date DATE not null,
    review_rating INTEGER not null,
    review_votes INTEGER,
    review_helpful_votes INTEGER,
    product_id CHAR(10) not null,
    product_title TEXT not null,
    product_sales_rank BIGINT,
    product_group TEXT,
    product_category TEXT,
    product_subcategory TEXT,
    similar_product_ids CHAR(10)[]
)
DISTRIBUTE BY APPEND (review_date);
digoal=> CREATE INDEX customer_id_index ON customer_reviews (customer_id);
CREATE INDEX


8.  在主节点 下载测试数据

wget http://examples.citusdata.com/customer_reviews_1999.csv.gz
wget http://examples.citusdata.com/customer_reviews_1998.csv.gz
gunzip customer_reviews_1998.csv.gz
gunzip customer_reviews_1999.csv.gz


9.  在主节点 导入数据, 注意目前往citusdb中导入数据必须从work节点使用psql连接到master节点.
导入distributed 表数据用的是SATGE命令, 和COPY命令的语法类似.

citusdb@db-172-16-3-150-> psql -h 1.1.1.1 -p 9900 digoal postgres
psql (9.2.1)
Type "help" for help.
digoal=# digoal=# \STAGE digoal.customer_reviews FROM '/home/citusdb/customer_reviews_1998.csv' (FORMAT CSV);
digoal=# select * from pg_dist_partition ;
 logicalrelid | partmethod |                                                          partkey                                       
                   
--------------+------------+--------------------------------------------------------------------------------------------------------
-------------------
        16467 | a          | {VAR :varno 1 :varattno 2 :vartype 1082 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattn
o 2 :location 428}
(1 row)
# 查看主节点数据库中的shard信息
digoal=# select * from pg_dist_shard;
 logicalrelid | shardid | shardstorage | shardalias | shardminvalue | shardmaxvalue 
--------------+---------+--------------+------------+---------------+---------------
        16467 |  102010 | t            |            | 1970-12-30    | 1998-12-31
(1 row)

digoal=# select * from pg_dist_shard_placement;
 shardid | shardstate | shardlength | nodename | nodeport 
---------+------------+-------------+----------+----------
  102010 |          1 |   145932288 | work01   |     9901
  102010 |          1 |   145932288 | work04   |     9904
(2 rows)

# worker节点中不存储shard信息  : 

digoal=# select * from pg_dist_shard;
 logicalrelid | shardid | shardstorage | shardalias | shardminvalue | shardmaxvalue 
--------------+---------+--------------+------------+---------------+---------------
(0 rows)

digoal=# select * from pg_dist_shard_placement;
 shardid | shardstate | shardlength | nodename | nodeport 
---------+------------+-------------+----------+----------
(0 rows)

digoal=# select * from pg_dist_partition ;
 logicalrelid | partmethod | partkey 
--------------+------------+---------
(0 rows)


# 如果不是从worker节点通过psql连接到master节点, 将会报错, 错误信息 : 

citusdb@db-172-16-3-150-> psql -h 172.16.3.150 digoal postgres
Password for user postgres: 
psql (9.2.1)
Type "help" for help.
digoal=# \STAGE digoal.customer_reviews FROM '/home/citusdb/customer_reviews_1998.csv' (FORMAT CSV);
remote command "SELECT * FROM master_get_candidate_nodes()" failed with ERROR:  could not find worker node for hostname: db-172-16-3-150.sky-mobi.com

citusdb@db-172-16-3-150-> psql -h 127.0.0.1 digoal postgres
psql (9.2.1)
Type "help" for help.
digoal=# \STAGE digoal.customer_reviews FROM '/home/citusdb/customer_reviews_1998.csv' (FORMAT CSV);
remote command "SELECT * FROM master_get_candidate_nodes()" failed with ERROR:  could not find worker node for hostname: localhost.localdomain

# 注释/etc/hosts中对应的主机
# 127.0.0.1              localhost.localdomain localhost
# 172.16.3.150 db-172-16-3-150.sky-mobi.com db-172-16-3-150
1.1.1.1 work01
1.1.1.2 work02
1.1.1.3 work03
1.1.1.4 work04

citusdb@db-172-16-3-150-> psql -h 127.0.0.1 -p 9900 digoal postgres
psql (9.2.1)
Type "help" for help.
digoal=# \STAGE digoal.customer_reviews FROM '/home/citusdb/customer_reviews_1998.csv' (FORMAT CSV);
remote command "SELECT * FROM master_get_candidate_nodes()" failed with ERROR:  could not find worker node for hostname: localhost


# 解决以上错误的方法, 注意这句话

CitusDB currently requires that you log into one of the worker nodes to stage data, and connect to the master node from the worker node using psql.

# 例如, 连接1.1.1.1:9900端口, 查看inet_client_addr.
digoal=# select inet_client_addr();
 inet_client_addr 
------------------
 1.1.1.1
(1 row)
虽然work,master在同一台主机上, 但是这个场景确实是从work01节点去连master节点的, 为什么呢?
1. master节点的 /etc/hosts中有1.1.1.1 work01
2. master的 pg_worker_list.conf中有work01 9901
3. 所以可以执行\STAGE导入数据.
因此多主机的环境中也要满足这种需求才行.

为什么需要从worker节点连到master节点去执行导入数据的动作呢?
原因参见 : 

# 继续导入另一个文件

citusdb@db-172-16-3-150-> psql -h 1.1.1.2 -p 9900 -U postgres digoal
psql (9.2.1)
Type "help" for help.
digoal=# \STAGE digoal.customer_reviews FROM '/home/citusdb/customer_reviews_1999.csv' (FORMAT CSV);
digoal=# select * from pg_dist_partition ;
 logicalrelid | partmethod |                                                          partkey                                       
                   
--------------+------------+--------------------------------------------------------------------------------------------------------
-------------------
        16467 | a          | {VAR :varno 1 :varattno 2 :vartype 1082 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattn
o 2 :location 428}
(1 row)

digoal=# select * from pg_dist_shard;
 logicalrelid | shardid | shardstorage | shardalias | shardminvalue | shardmaxvalue 
--------------+---------+--------------+------------+---------------+---------------
        16467 |  102010 | t            |            | 1970-12-30    | 1998-12-31
        16467 |  102014 | t            |            | 1999-01-01    | 1999-12-31
(2 rows)

digoal=# select * from pg_dist_shard_placement;
 shardid | shardstate | shardlength | nodename | nodeport 
---------+------------+-------------+----------+----------
  102010 |          1 |   145932288 | work01   |     9901
  102010 |          1 |   145932288 | work04   |     9904
  102014 |          1 |   286695424 | work02   |     9902
  102014 |          1 |   286695424 | work03   |     9903
(4 rows)


10.  在主节点 查询表数据
# 由于目前citusDB对schema支持有问题, 会有如下错误 : 

digoal=# \set VERBOSITY verbose
digoal=# select count(*) from digoal.customer_reviews ;
WARNING:  01000: could not receive query results
DETAIL:  Client result status: PGRES_FATAL_ERROR
LOCATION:  ClientQueryStatus, libpqclientexecutor.c:460
WARNING:  01000: could not receive query results
DETAIL:  Client result status: PGRES_FATAL_ERROR
LOCATION:  ClientQueryStatus, libpqclientexecutor.c:460
WARNING:  01000: could not receive query results
DETAIL:  Client result status: PGRES_FATAL_ERROR
LOCATION:  ClientQueryStatus, libpqclientexecutor.c:460
WARNING:  01000: could not receive query results
DETAIL:  Client result status: PGRES_FATAL_ERROR
LOCATION:  ClientQueryStatus, libpqclientexecutor.c:460
WARNING:  01000: could not receive query results
DETAIL:  Client result status: PGRES_FATAL_ERROR
LOCATION:  ClientQueryStatus, libpqclientexecutor.c:460
WARNING:  01000: could not receive query results
DETAIL:  Client result status: PGRES_FATAL_ERROR
LOCATION:  ClientQueryStatus, libpqclientexecutor.c:460
ERROR:  XX000: failed to execute job 5
DETAIL:  Failure due to failed task 1
LOCATION:  MultiServerExecute, multi_server_executor.c:181
digoal=# \q

# 目前CitusDB貌似不支持schema
# 在public中创建则没有以上问题
# 连接到所有节点, 删除digoal schema
# 在主节点的public下面重新新建表

digoal=# CREATE TABLE customer_reviews                                    
(
    customer_id TEXT not null,
    review_date DATE not null,
    review_rating INTEGER not null,
    review_votes INTEGER,
    review_helpful_votes INTEGER,
    product_id CHAR(10) not null,
    product_title TEXT not null,
    product_sales_rank BIGINT,
    product_group TEXT,
    product_category TEXT,
    product_subcategory TEXT,
    similar_product_ids CHAR(10)[]
)
DISTRIBUTE BY APPEND (review_date);
digoal=# CREATE INDEX customer_id_index ON customer_reviews (customer_id);

# 在任意worker节点通过psql连接到master节点, 导入数据 : 

digoal=# \STAGE customer_reviews FROM '/home/citusdb/customer_reviews_1998.csv' (FORMAT CSV);
digoal=# select count(*) from customer_reviews ;
 count  
--------
 589859
(1 row)
digoal=# explain analyze verbose select count(*) from customer_reviews ;
                              QUERY PLAN                              
----------------------------------------------------------------------
 explain statements for distributed queries are currently unsupported
(1 row)
digoal=# \STAGE customer_reviews FROM '/home/citusdb/customer_reviews_1999.csv' (FORMAT CSV);
digoal=# select count(*) from customer_reviews ;
  count  
---------
 1762504
(1 row)

# 其他查询测试

digoal=# SELECT
    width_bucket(length(product_title), 1, 50, 5) title_length_bucket,
    round(avg(review_rating), 2) AS review_average,
    count(*)
FROM
   customer_reviews
WHERE
    product_group = 'Book'
GROUP BY
    title_length_bucket
ORDER BY
    title_length_bucket;
 title_length_bucket | review_average | count  
---------------------+----------------+--------
                   1 |           4.26 | 139034
                   2 |           4.24 | 411318
                   3 |           4.34 | 245671
                   4 |           4.32 | 167361
                   5 |           4.30 | 118422
                   6 |           4.40 | 116412
(6 rows)
digoal=# SELECT
    customer_id, review_date, review_rating, product_id, product_title
FROM                                               
    customer_reviews
WHERE
    customer_id ='A27T7HVDXA3K2A' AND
    product_title LIKE '%Dune%' AND
    review_date >= '1998-01-01' AND
    review_date <= '1998-12-31';
  customer_id   | review_date | review_rating | product_id |                 product_title                 
----------------+-------------+---------------+------------+-----------------------------------------------
 A27T7HVDXA3K2A | 1998-04-10  |             5 | 1559949570 | Dune Audio Collection
 A27T7HVDXA3K2A | 1998-04-10  |             5 | 0881036366 | Dune (Dune Chronicles (Econo-Clad Hardcover))
 A27T7HVDXA3K2A | 1998-04-10  |             5 | 0441172717 | Dune (Dune Chronicles, Book 1)
 A27T7HVDXA3K2A | 1998-04-10  |             5 | 044100590X | Dune
 A27T7HVDXA3K2A | 1998-04-10  |             5 | 0399128964 | Dune (Dune Chronicles (Econo-Clad Hardcover))
(5 rows)


【注意】
1. 数据导入注意事项, 导入数据必须从work节点去连接master节点. 否则会报异常.
2. schema 问题, 目前在CitusDB中使用public以外的schema将会造成查询异常. 所以现在建议使用public.
3. CitusDB新增的相关参数见本文参考部分.

【参考】
3. postgresql.conf中citusdb相关部分 : 

#------------------------------------------------------------------------------
# DISTRIBUTED DATABASE
#------------------------------------------------------------------------------

#max_worker_nodes_tracked = 2048        # maximum worker nodes that are tracked
                                        # (change requires restart)

#shard_replication_factor = 2           # number of replicas to maintain, 多少份拷贝
#shard_max_size = 2GB                   # maximum size a shard will grow, 达到多大后开始shard
                                        # (stored permanently in system catalogs)

#remote_task_check_interval = 100ms     # 10-100000 milliseconds between checks
#task_tracker_active = off              # starts up the task tracker process
                                        # (change requires restart)
#task_tracker_delay = 200ms             # 10-100000 milliseconds between rounds
#max_tracked_tasks_per_node = 256       # maximum tasks that are tracked per node
                                        # (change requires restart)
#max_running_tasks_per_node = 8         # maximum tasks to run concurrently

#partition_buffer_size = 8MB            # per process repartitioning buffer size
#large_table_shard_count = 4            # table considered large after this size
#limit_clause_row_fetch_count = -1      # number of rows to fetch from each task
#task_assignment_policy = 'greedy'      # task assignment policy to use

4. distributed表在主节点的INSERT, UPDATE, DELETE语句不支持.

digoal=# update customer_reviews set review_rating=review_rating where customer_id='AE22YDHSBFYIP';
ERROR:  0A000: cannot execute UPDATE on a distributed table on master node
LOCATION:  PreventCommandIfMasterTable, utility.c:360

digoal=# insert into customer_reviews select * from customer_reviews limit 1;
ERROR:  0A000: cannot execute INSERT on a distributed table on master node
LOCATION:  PreventCommandIfMasterTable, utility.c:360

digoal=# delete from customer_reviews ;
ERROR:  0A000: cannot execute DELETE on a distributed table on master node
LOCATION:  PreventCommandIfMasterTable, utility.c:360

普通表的DML是支持的 : 

digoal=# create table test(id int);
CREATE TABLE
digoal=# insert into test values (1);
INSERT 0 1
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值