clickhouse部署应用以及分布式表

本文详细记录了如何通过yum安装ClickHouse服务器和客户端,配置ulimit及cluster设置,创建并操作分布式表的过程,展示了数据分布和同步机制的实际应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 

 

部署

[root@slave1 ~]# yum install yum-utils
[root@slave1 ~]# rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
[root@slave1 ~]# yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64
Loaded plugins: fastestmirror
adding repo from: https://repo.clickhouse.tech/rpm/stable/x86_64

[repo.clickhouse.tech_rpm_stable_x86_64]
name=added from: https://repo.clickhouse.tech/rpm/stable/x86_64
baseurl=https://repo.clickhouse.tech/rpm/stable/x86_64
enabled=1

[root@slave1 ~]# yum install clickhouse-server clickhouse-client
Loaded plugins: fastestmirror
repo.clickhouse.tech_rpm_stable_x86_64                                                                                                      | 3.0 kB  00:00:00     
repo.clickhouse.tech_rpm_stable_x86_64/primary_db                                                                                           | 267 kB  00:00:01     
Loading mirror speeds from cached hostfile
 * base: mirrors.aliyun.com
 * epel: hk.mirrors.thegigabit.com
 * extras: mirrors.aliyun.com
 * updates: mirrors.aliyun.com
Resolving Dependencies
--> Running transaction check
---> Package clickhouse-client.noarch 0:20.9.3.45-2 will be installed
--> Processing Dependency: clickhouse-common-static = 20.9.3.45-2 for package: clickhouse-client-20.9.3.45-2.noarch
---> Package clickhouse-server.noarch 0:20.9.3.45-2 will be installed
--> Running transaction check
---> Package clickhouse-common-static.x86_64 0:20.9.3.45-2 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

===================================================================================================================================================================
 Package                                    Arch                     Version                        Repository                                                Size
===================================================================================================================================================================
Installing:
 clickhouse-client                          noarch                   20.9.3.45-2                    repo.clickhouse.tech_rpm_stable_x86_64                   123 k
 clickhouse-server                          noarch                   20.9.3.45-2                    repo.clickhouse.tech_rpm_stable_x86_64                   147 k
Installing for dependencies:
 clickhouse-common-static                   x86_64                   20.9.3.45-2                    repo.clickhouse.tech_rpm_stable_x86_64                   135 M

Transaction Summary
===================================================================================================================================================================
Install  2 Packages (+1 Dependent package)

Total download size: 136 M
Installed size: 527 M
Is this ok [y/d/N]: y
Downloading packages:
(1/3): clickhouse-client-20.9.3.45-2.noarch.rpm                                                                                             | 123 kB  00:00:01     
(2/3): clickhouse-server-20.9.3.45-2.noarch.rpm                                                                                             | 147 kB  00:00:01     
(3/3): clickhouse-common-static-20.9.3.45-2.x86_64.rpm                                                                                      | 135 MB  00:02:28     
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                              934 kB/s | 136 MB  00:02:28     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : clickhouse-common-static-20.9.3.45-2.x86_64                                                                                                     1/3 
  Installing : clickhouse-client-20.9.3.45-2.noarch                                                                                                            2/3 
  Installing : clickhouse-server-20.9.3.45-2.noarch                                                                                                            3/3 
Created symlink from /etc/systemd/system/multi-user.target.wants/clickhouse-server.service to /etc/systemd/system/clickhouse-server.service.
Path to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/
  Verifying  : clickhouse-client-20.9.3.45-2.noarch                                                                                                            1/3 
  Verifying  : clickhouse-common-static-20.9.3.45-2.x86_64                                                                                                     2/3 
  Verifying  : clickhouse-server-20.9.3.45-2.noarch                                                                                                            3/3 

Installed:
  clickhouse-client.noarch 0:20.9.3.45-2                                           clickhouse-server.noarch 0:20.9.3.45-2                                          

Dependency Installed:
  clickhouse-common-static.x86_64 0:20.9.3.45-2                                                                                                                    

Complete!

修改 ulimit 配置

[root@master apache-zookeeper-3.5.8]# cat /etc/security/limits.d/clickhouse.conf
clickhouse      soft    nofile  1073741824
clickhouse      hard    nofile  1073741824

config文件配置

<remote_servers incl="clickhouse_remote_servers" >
      <perftest_2shards_1replicas>
        <!-- 数据分片1  -->
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>master</host>
                    <port>9000</port>
                </replica>
            </shard>
            <!-- 数据分片2  -->
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>slave1</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>slave2</host>
                    <port>9000</port>
                </replica>
            </shard>
        </perftest_2shards_1replicas>
    </remote_servers>

<zookeeper incl="zookeeper-servers" optional="true" />
    <zookeeper-servers>
        <node index="1">
            <host>master</host>
            <port>2181</port>
        </node>
        <node index="2">
            <host>slave1</host>
            <port>2181</port>
        </node>
        <node index="3">
            <host>slave2</host>
            <port>2181</port>
        </node>
    </zookeeper-servers>

查询集群情况

两个分片,其中一个分片只有一个副本,另外一个分片有两个副本

master :) select * from clusters;

SELECT *
FROM clusters

┌─cluster────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address────┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ perftest_2shards_1replicas │         1 │            1 │           1 │ master    │ 192.168.153.136 │ 9000 │        1 │ default │                  │            0 │                       0 │
│ perftest_2shards_1replicas │         2 │            1 │           1 │ salve1    │                 │ 9000 │        0 │ default │                  │            0 │                       0 │
│ perftest_2shards_1replicas │         2 │            1 │           2 │ salve2    │                 │ 9000 │        0 │ default │                  │            0 │                       0 │
└────────────────────────────┴───────────┴──────────────┴─────────────┴───────────┴─────────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

3 rows in set. Elapsed: 45.672 sec. 

测试分布式表

在分片2上建本地表
节点1
slave1 :) CREATE TABLE image_label2 ( label_id UInt32,  label_name String,  insert_time Date) ENGINE = ReplicatedMergeTree('/clickhouse/tables/01-01/image_label2','cluster01-01-1',insert_time, (label_id, insert_time), 8192)

CREATE TABLE image_label2
(
    `label_id` UInt32,
    `label_name` String,
    `insert_time` Date
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/01-01/image_label2', 'cluster01-01-1', insert_time, (label_id, insert_time), 8192)

Ok.

0 rows in set. Elapsed: 0.049 sec. 


节点2
slave2 :) CREATE TABLE image_label2 ( label_id UInt32,  label_name String,  insert_time Date) ENGINE = ReplicatedMergeTree('/clickhouse/tables/01-01/image_label2','cluster01-01-2',insert_time, (label_id, insert_time), 8192)

CREATE TABLE image_label2
(
    `label_id` UInt32,
    `label_name` String,
    `insert_time` Date
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/01-01/image_label2', 'cluster01-01-2', insert_time, (label_id, insert_time), 8192)

Ok.

0 rows in set. Elapsed: 0.070 sec. 

在分片1上建本地表
master :) CREATE TABLE image_label2 ( label_id UInt32,  label_name String,  insert_time Date) ENGINE = ReplicatedMergeTree('/clickhouse/tables/02-01/image_label2','cluster02-01-1',insert_time, (label_id, insert_time), 8192)

CREATE TABLE image_label2
(
    `label_id` UInt32,
    `label_name` String,
    `insert_time` Date
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/02-01/image_label2', 'cluster02-01-1', insert_time, (label_id, insert_time), 8192)

Ok.

0 rows in set. Elapsed: 0.059 sec. 



在3个节点建立分布式表
slave1 :) CREATE TABLE image_label_all2 AS image_label2 ENGINE = Distributed(perftest_2shards_1replicas, default, image_label2, rand())

CREATE TABLE image_label_all2 AS image_label2
ENGINE = Distributed(perftest_2shards_1replicas, default, image_label2, rand())

Ok.

0 rows in set. Elapsed: 0.003 sec. 

slave2 :) CREATE TABLE image_label_all2 AS image_label2 ENGINE = Distributed(perftest_2shards_1replicas, default, image_label2, rand())

CREATE TABLE image_label_all2 AS image_label2
ENGINE = Distributed(perftest_2shards_1replicas, default, image_label2, rand())

Ok.

0 rows in set. Elapsed: 0.002 sec. 

master :) CREATE TABLE image_label_all2 AS image_label2 ENGINE = Distributed(perftest_2shards_1replicas, default, image_label2, rand())

CREATE TABLE image_label_all2 AS image_label2
ENGINE = Distributed(perftest_2shards_1replicas, default, image_label2, rand())

Ok.

0 rows in set. Elapsed: 0.003 sec. 


往分片2的本地节点1写入数据
slave1 :) insert into image_label2(label_id,label_name)values(100,'test100');

INSERT INTO image_label2 (label_id, label_name) VALUES

Ok.

1 rows in set. Elapsed: 0.021 sec. 

slave1 :) select * from image_label2;

SELECT *
FROM image_label2

┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘

由于分片间有数据复制作用,分片1的节点2的本地表也有数据了
slave2 :) select * from image_label2;

SELECT *
FROM image_label2

┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘

分片1本地表就没有数据
master :) select * from image_label2;

SELECT *
FROM image_label2

Ok.

0 rows in set. Elapsed: 0.003 sec. 

查看分布式表
slave1 :) select * from image_label_all2

SELECT *
FROM image_label_all2

┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘

1 rows in set. Elapsed: 0.007 sec. 

slave2 :) select * from image_label_all2

SELECT *
FROM image_label_all2

┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘

1 rows in set. Elapsed: 0.009 sec. 

master :) select * from image_label_all2

SELECT *
FROM image_label_all2

┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘

1 rows in set. Elapsed: 0.008 sec. 

然后往分片1的本地表写数据
master :) insert into image_label2(label_id,label_name)values(1000,'master1000');

INSERT INTO image_label2 (label_id, label_name) VALUES

Ok.

1 rows in set. Elapsed: 0.020 sec. 

master :) select * from image_label2

SELECT *
FROM image_label2

┌─label_id─┬─label_name─┬─insert_time─┐
│     1000 │ master1000 │  1970-01-01 │
└──────────┴────────────┴─────────────┘

1 rows in set. Elapsed: 0.002 sec.
分片2的两个节点的本地表看不到数据的
slave2 :) select * from image_label2;

SELECT *
FROM image_label2

┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘

1 rows in set. Elapsed: 0.002 sec. 

slave1 :) select * from image_label2;

SELECT *
FROM image_label2

┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘

1 rows in set. Elapsed: 0.002 sec. 

再看分布式表,大家都能看到
slave1 :)  select * from image_label_all2

SELECT *
FROM image_label_all2

┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│     1000 │ master1000 │  1970-01-01 │
└──────────┴────────────┴─────────────┘

2 rows in set. Elapsed: 0.006 sec. 


slave2 :) select * from image_label_all2

SELECT *
FROM image_label_all2

┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│     1000 │ master1000 │  1970-01-01 │
└──────────┴────────────┴─────────────┘

2 rows in set. Elapsed: 0.007 sec. 

master :) select * from image_label_all2

SELECT *
FROM image_label_all2

┌─label_id─┬─label_name─┬─insert_time─┐
│     1000 │ master1000 │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘

2 rows in set. Elapsed: 0.005 sec.

测试写分布式表
master :) insert into image_label_all2(label_id,label_name)values(2000,'dist2000');

INSERT INTO image_label_all2 (label_id, label_name) VALUES

Ok.

1 rows in set. Elapsed: 0.003 sec. 

master :) select * from image_label_all2

SELECT *
FROM image_label_all2

┌─label_id─┬─label_name─┬─insert_time─┐
│     1000 │ master1000 │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│     2000 │ dist2000   │  1970-01-01 │
└──────────┴────────────┴─────────────┘

3 rows in set. Elapsed: 0.008 sec. 

master :) select * from image_label2

SELECT *
FROM image_label2

┌─label_id─┬─label_name─┬─insert_time─┐
│     1000 │ master1000 │  1970-01-01 │
└──────────┴────────────┴─────────────┘

1 rows in set. Elapsed: 0.001 sec. 

新写入的分布式表的数据落在分片2上
slave1 :) select * from image_label2

SELECT *
FROM image_label2

┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│     2000 │ dist2000   │  1970-01-01 │
└──────────┴────────────┴─────────────┘
slave2 :) select * from image_label2

SELECT *
FROM image_label2

┌─label_id─┬─label_name─┬─insert_time─┐
│      100 │ test100    │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│     2000 │ dist2000   │  1970-01-01 │
└──────────┴────────────┴─────────────┘

2 rows in set. Elapsed: 0.002 sec. 

 

 

 

### 关于ClickHouse分布式 #### 配置方法 在ClickHouse中,`Distributed`引擎用于创建分布式的逻辑视图。这使得查询能够被分发到多个节点上的底层本地,并最终汇总结果返回给客户端。要创建一张基于`Distributed`引擎的格,需提供集群名、目标数据库名、实际存储数据的目标以及分片键等参数[^4]。 ```sql CREATE TABLE distributed_table ON CLUSTER '{cluster}' ( `id` UInt64, ... ) ENGINE = Distributed('{cluster}', 'default', 'local_table', rand()); ``` 上述SQL语句展示了如何定义一个名为`distributed_table`的分布式,在此过程中指定了所属集群(`{cluster}`),默认使用的数据库(`'default'`)及其内部的具体(`'local_table'`)作为数据源;而`rand()`函数用来随机分配记录至不同分片之中。 #### 使用场景 对于大规模数据分析而言,单台服务器难以满足海量数据处理需求。借助`Distributed`结构,可以轻松实现跨多台机器的数据管理和高效检索操作。尤其适用于以下几种情况: - 数据量庞大以至于单一实例无法承载; - 对高可用性和容错能力有较高要求的应用环境; - 实现水平扩展以应对不断增长的工作负载压力。 #### 性能调优 为了进一步增强系统的响应速度并减少延迟时间,可以从以下几个方面着手优化: - **合理规划硬件资源**:考虑到I/O性能的重要性,建议采用固态硬盘(SSD)代替传统机械硬盘来部署ClickHouse服务端程序,因为前者可带来显著的速度优势——大约是后者的两到三倍效率提升[^2]。 - **调整预过滤机制**:启用`optimize_move_to_prewhere`选项有助于提前筛选符合条件的数据集,进而减轻后续计算负担。该功能默认处于激活状态(set optimize_move_to_prewhere=1)[^1]。 - **优化JOIN操作**:当涉及关联两张大时,应特别关注执行计划是否正确应用了预期中的索引或分区策略。必要情况下可以通过EXPLAIN命令验证具体的执行路径,并考虑将右侧参与连接的对象转换成子查询形式以便更好地控制其行为模式[^3]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值