ClickHouse高可用集群搭建 2分片2副本

本文详细介绍了如何在两台机器上部署ClickHouse高可用集群,包括安装Zookeeper集群、配置多个实例和zk配置,以及ClickHouse的安装、配置文件修改和集群启动验证。涉及步骤包括下载、解压、配置、服务管理及分布式表的创建。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

部署安装

本人是基于以下机器

在这里插入图片描述

一、先安装zookeeper集群

因资源有限,在2台机器上部署3个zk实例,一台1个实例,另一台2个zk实例。

1、下载压缩包

官网下载:https://zookeeper.apache.org/releases.html

2、解压

数据盘安装

cd /data;
#创建目录
mkdir zookeeper
#上传压缩包
rz 
tar -zxvf apache-zookeeper-3.7.0-bin.tar.gz 

3、解压后的目录

在这里插入图片描述

4、配置

cd conf;
cp zoo_sample.cfg zoo.cfg;
vim zoo.cfg;

指定路径
dataDir=/data/zookeeper/apache-zookeeper-3.7.0-bin/data
dataLogDir=/data/zookeeper/apache-zookeeper-3.7.0-bin/log
本人配置的3个实例clientPort依次是2182/2182/2183

zoo.cfg:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/apache-zookeeper-3.7.0-bin/data
dataLogDir=/data/zookeeper/apache-zookeeper-3.7.0-bin/log
clientPort=2181
maxClientCnxns=600
maxSessionTimeout=120000
minSessionTimeout=30000
autopurge.snapRetainCount=10
autopurge.purgeInterval=1
server.1=ip1:2887:3887
server.2=ip2.24.11:2888:3888
server.3=ip:2889:3889
创建myid文件
cd /data/zookeeper/apache-zookeeper-3.7.0-bin/data;
touch myid;
#添加内容为zoo.cfg最下面实例机器序号如server.1
vim myid;添加1

5、几个命令

#启动zk
sh bin/zkServer.sh start
#查看状态
sh bin/zkServer.sh status
#停止
sh bin/zkServer.sh stop
#客户端登陆
sh bin/zkCli.sh -server 127.0.0.1:2181 
 #强制杀进程
 fuser -v -n tcp 2181
 kill -s 9 pid

二、clickhouse集群安装

2台机器部署4个ch实例,2分片2副本
架构图:

在这里插入图片描述

hosts设置hostname
ip1 node01
ip2 node02

1、下载安装包

直接看官网 https://clickhouse.com/#quick-start

sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo
sudo yum install clickhouse-server clickhouse-client

sudo /etc/init.d/clickhouse-server start
clickhouse-client

2、修改配置文件

cd /etc/clickhouse-server/

我这边是2个clickhouse实例,一个端口号9000,一个9001

config.xml

源文件中很多注释,已被我删掉,可以直接拷贝下面的进行修改

<?xml version="1.0"?>
<yandex>
    <logger>
        <level>trace</level>
        <log>/data/logs/clickhouse-server/clickhouse-server.log</log>
        <errorlog>/data/logs/clickhouse-server/clickhouse-server.err.log</errorlog>
        <size>200M</size>
        <count>5</count>
    </logger>

    <http_port>8123</http_port>
    <tcp_port>9000</tcp_port>

    <mysql_port>9004</mysql_port>

    <postgresql_port>9005</postgresql_port>

    <interserver_http_port>9009</interserver_http_port>
    <interserver_http_host>node01</interserver_http_host>

    <listen_host>0.0.0.0</listen_host>

    <include_from>/etc/clickhouse-server/metrika.xml</include_from>
    <remote_servers incl="clickhouse_remote_servers" optional="true"/>
    <zookeeper incl="zookeeper_servers" optional="true"/>
    <macros incl="macros" optional="true"/>

    <max_connections>4096</max_connections>

    <keep_alive_timeout>3</keep_alive_timeout>
    <database_atomic_delay_before_drop_table_sec>0</database_atomic_delay_before_drop_table_sec>

    <grpc>
        <enable_ssl>false</enable_ssl>
        <ssl_cert_file>/path/to/ssl_cert_file</ssl_cert_file>
        <ssl_key_file>/path/to/ssl_key_file</ssl_key_file>
        <ssl_require_client_auth>false</ssl_require_client_auth>
        <ssl_ca_cert_file>/path/to/ssl_ca_cert_file</ssl_ca_cert_file>
        <compression>deflate</compression>
        <compression_level>medium</compression_level>
        <max_send_message_size>-1</max_send_message_size>
        <max_receive_message_size>-1</max_receive_message_size>
        <verbose_logs>false</verbose_logs>
    </grpc>

    <openSSL>
        <server> <!-- Used for https server AND secure tcp port -->
            <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
            <certificateFile>/etc/clickhouse-server/server.crt</certificateFile>
            <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile>
            <dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile>
            <verificationMode>none</verificationMode>
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
        </server>

        <client> <!-- Used for connecting to https dictionary source and secured Zookeeper communication -->
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
            <!-- Use for self-signed: <verificationMode>none</verificationMode> -->
            <invalidCertificateHandler>
                <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
                <name>RejectCertificateHandler</name>
            </invalidCertificateHandler>
        </client>
    </openSSL>
    <max_concurrent_queries>100</max_concurrent_queries>
    <max_server_memory_usage>0</max_server_memory_usage>
    <max_thread_pool_size>10000</max_thread_pool_size>
    <max_server_memory_usage_to_ram_ratio>0.9</max_server_memory_usage_to_ram_ratio>
    <total_memory_profiler_step>4194304</total_memory_profiler_step>
    <total_memory_tracker_sample_probability>0</total_memory_tracker_sample_probability>
    <uncompressed_cache_size>8589934592</uncompressed_cache_size>
    <mark_cache_size>5368709120</mark_cache_size>
    <mmap_cache_size>1000</mmap_cache_size>
    <compiled_expression_cache_size>1073741824</compiled_expression_cache_size>

    <!-- Path to data directory, with trailing slash. -->
    <path>/data/clickhouse/</path>

    <!-- Path to temporary data for processing hard queries. -->
    <tmp_path>/data/clickhouse/tmp/</tmp_path>


    <!-- Directory with user provided files that are accessible by 'file' table function. -->
    <user_files_path>/data/clickhouse/user_files/</user_files_path>

    <!-- LDAP server definitions. -->
    <ldap_servers>

    </ldap_servers>

    <user_directories>
        <users_xml>
            <!-- Path to configuration file with predefined users. -->
            <path>users.xml</path>
        </users_xml>
        <local_directory>
            <!-- Path to folder where users created by SQL commands are stored. -->
            <path>/data/clickhouse/access/</path>
        </local_directory>

    </user_directories>

    <!-- Default profile of settings. -->
    <default_profile>default</default_profile>

    <!-- Comma-separated list of prefixes for user-defined settings. -->
    <custom_settings_prefixes></custom_settings_prefixes>

    <default_database>default</default_database>

    <timezone>Asia/Shanghai</timezone>

    <mlock_executable>true</mlock_executable>

    <!-- Reallocate memory for machine code ("text") using huge pages. Highly experimental. -->
    <remap_executable>false</remap_executable>

    <remote_servers>
    </remote_servers>

    <builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>
    <max_session_timeout>3600</max_session_timeout>
    <default_session_timeout>60</default_session_timeout>
    <query_log>

        <database>system</database>
        <table>query_log</table>

        <partition_by>toYYYYMM(event_date)</partition_by>

        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_log>

    <!-- Trace log. Stores stack traces collected by query profilers.
         See query_profiler_real_time_period_ns and query_profiler_cpu_time_period_ns settings. -->
    <trace_log>
        <database>system</database>
        <table>trace_log</table>

        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </trace_log>

    <!-- Query thread log. Has information about all threads participated in query execution.
         Used only for queries with setting log_query_threads = 1. -->
    <query_thread_log>
        <database>system</database>
        <table>query_thread_log</table>
        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_thread_log>
    <metric_log>
        <database>system</database>
        <table>metric_log</table>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
        <collect_interval_milliseconds>1000</collect_interval_milliseconds>
    </metric_log>

    <!--
        Asynchronous metric log contains values of metrics from
        system.asynchronous_metrics.
    -->
    <asynchronous_metric_log>
        <database>system</database>
        <table>asynchronous_metric_log</table>
        <!--
            Asynchronous metrics are updated once a minute, so there is
            no need to flush more often.
        -->
        <flush_interval_milliseconds>7000</flush_interval_milliseconds>
    </asynchronous_metric_log>

    <!--
        OpenTelemetry log contains OpenTelemetry trace spans.
    -->
    <opentelemetry_span_log>
        <engine>
            engine MergeTree
            partition by toYYYYMM(finish_date)
            order by (finish_date, finish_time_us, trace_id)
        </engine>
        <database>system</database>
        <table>opentelemetry_span_log</table>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </opentelemetry_span_log>
    <crash_log>
        <database>system</database>
        <table>crash_log</table>

        <partition_by />
        <flush_interval_milliseconds>1000</flush_interval_milliseconds>
    </crash_log>
    <top_level_domains_lists>
    </top_level_domains_lists>
    <dictionaries_config>*_dictionary.xml</dictionaries_config>
    <distributed_ddl>
        <!-- Path in ZooKeeper to queue with DDL queries -->
        <path>/clickhouse/task_queue/ddl</path>
    </distributed_ddl>
    <graphite_rollup_example>
        <pattern>
            <regexp>click_cost</regexp>
            <function>any</function>
            <retention>
                <age>0</age>
                <precision>3600</precision>
            </retention>
            <retention>
                <age>86400</age>
                <precision>60</precision>
            </retention>
        </pattern>
        <default>
            <function>max</function>
            <retention>
                <age>0</age>
                <precision>60</precision>
            </retention>
            <retention>
                <age>3600</age>
                <precision>300</precision>
            </retention>
            <retention>
                <age>86400</age>
                <precision>3600</precision>
            </retention>
        </default>
    </graphite_rollup_example>

    <format_schema_path>/var/lib/clickhouse/format_schemas/</format_schema_path>

    <query_masking_rules>
        <rule>
            <name>hide encrypt/decrypt arguments</name>
            <regexp>((?:aes_)?(?:encrypt|decrypt)(?:_mysql)?)\s*\(\s*(?:'(?:\\'|.)+'|.*?)\s*\)</regexp>
            <replace>\1(???)</replace>
        </rule>
    </query_masking_rules>

    <send_crash_reports>
        <enabled>false</enabled>
        <anonymize>false</anonymize>
        <endpoint>https://6f33034cfe684dd7a3ab9875e57b1c8d@o388870.ingest.sentry.io/5226277</endpoint>
    </send_crash_reports>
</yandex>
metrika.xml
<?xml  version="1.0"?>
<yandex>
    <clickhouse_remote_servers>
      <cluster_2shards_2replicas>
        <shard>
            <internal_replication>true</internal_replication>
            <weight>1</weight>
            <replica>
                <host>node01</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>node02</host>
                <port>9001</port>
            </replica>
        </shard>

        <shard>
            <internal_replication>true</internal_replication>
            <weight>1</weight>
            <replica>
                <host>node02</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>node01</host>
                <port>9001</port>
            </replica>
        </shard>
      </cluster_2shards_2replicas>
    </clickhouse_remote_servers>

    <!--配置复制表的分片名和副本名字宏定义-->
    <macros>
        <shard>01</shard>
        <replica>cluster01-01-01</replica>
    </macros>

    <zookeeper_servers>
        <node index="1">
            <host>node01</host>
            <port>2181</port>
        </node>
        <node index="2">
            <host>node02</host>
            <port>2182</port>
        </node>
        <node index="3">
            <host>node02</host>
            <port>2183</port>
        </node>
        <session_timeout_ms>60000</session_timeout_ms>
        <operation_timeout_ms>30000</operation_timeout_ms>
    </zookeeper_servers>

   <networks>
     <ip>::/0</ip>
   </networks>

   <!--MergeTree引擎表的数据压缩设置-->
   <clickhouse_compression>
     <case>
       <!--数据部分的最小大小-->
       <min_part_size>10000000000</min_part_size>
       <!--数据部分大小与表大小的比率-->
       <min_part_size_ratio>0.01</min_part_size_ratio>
       <!--压缩算法,zstd和lz4-->
       <method>lz4</method>
     </case>
   </clickhouse_compression>

</yandex>
users.xml
<?xml version="1.0"?>
<yandex>
    <profiles>
        <default>
            <max_memory_usage>10000000000</max_memory_usage>
            <load_balancing>random</load_balancing>
        </default>

        <readonly>
            <readonly>1</readonly>
        </readonly>
    </profiles>

    <users>
        <default>
            <password></password>
            <networks>
               <!-- <ip>::/0</ip>-->
                <ip>127.0.0.1</ip>
            </networks>
            <profile>default</profile>
            <quota>default</quota>
        </default>
        <guest>
            <password>guest</password>
            <networks>
                <ip>::/0</ip>
            </networks>
            <profile>readonly</profile>
            <quota>default</quota>
        </guest>

    </users>

    <quotas>
        <default>
            <interval>
                <duration>3600</duration>
                <queries>0</queries>
                <errors>0</errors>
                <result_rows>0</result_rows>
                <read_rows>0</read_rows>
                <execution_time>0</execution_time>
            </interval>
        </default>
    </quotas>
</yandex>

3、验证

# 启动方式一
systemctl start clickhouse-server
systemctl start clickhouse-server9001
# 关闭
systemctl stop clickhouse-server
systemctl stop clickhouse-server9001
#查看运行状态
systemctl status clickhouse-server
clickhouse-client
# 启动方式二
 sudo -u clickhouse clickhouse-server --config-file=/etc/clickhouse-server/config.xml

# 端口
ss -ntpl|grep 90*
# 登陆查看
clickhouse-client --port 9000
clickhouse-client --port 9001

多实例的话除了复制上述操作后还需以下操作

cd /etc/systemd/system/
cp clickhouse-server.service clickhouse-server9001.service
vim clickhouse-server9001.service;

clickhouse-server9001.service:

[Unit]
Description=ClickHouse Server (analytic DBMS for big data)
Requires=network-online.target
After=network-online.target

[Service]
Type=simple
User=clickhouse
Group=clickhouse
Restart=always
RestartSec=30
RuntimeDirectory=clickhouse-server
ExecStart=/usr/bin/clickhouse-server --config=/etc/clickhouse-server/config9001.xml --pid-file=/run/clickhouse-server/clickhouse-server9001.pid
LimitCORE=infinity
LimitNOFILE=500000
CapabilityBoundingSet=CAP_NET_ADMIN CAP_IPC_LOCK CAP_SYS_NICE

[Install]
WantedBy=multi-user.target

至此clickhouse2分片2副本集群安装完毕

测试

--集群操作,除了建分布式表外,都得加上on cluster clustername
--建分布式数据库
create database if not exists indicator on cluster cluster_2shards_2replicas engine=Ordinary;
--建集群各个节点的本地表
create table if not exists indicator.indicator_operating_local 
on cluster cluster_2shards_2replicas(`create_time` DateTime default now())

--建分布式表
create table indicator_operating_all as indicator_operating_local engine=Distributed(cluster_2shards_2replicas,indicator,indicator_operating_local,sipHash64(company_id))

以上是我个人从0到1查阅资料完成集群配置安装的总结 ,如有不对的地方还请留言指正。
下一篇介绍clickhouse客户端可视化工具以及集群监控

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值