最全大数据存储技术之ClickHouse入门学习(一)(2),2024年最新2024我是如何拿到小米、京东、字节的offer

img
img

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化资料的朋友,可以戳这里获取

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

grep -q sse4_2 /proc/cpuinfo && echo “SSE 4.2 supported” || echo “SSE 4.2 not supported”

SSE 4.2 supported

2、取消打开文件数限制

vim /etc/security/limits.conf

vim /etc/security/limits.d/20-nproc.conf

在 limits.conf 和 20-nproc.conf 文件末尾添加一下代码

  • soft nofile 65536

  • hard nofile 65536

  • soft nproc 131072

  • hard nproc 131072

查看修改

ulimit -a

cat /etc/security/limits.conf

cat /etc/security/limits.d/20-nproc.conf

3、取消SELINUX

Security-Enhanced Linux(SELINUX)是提供访问控制安全策略的机制或安全模块,用于将用户限制为系统管理员设置的某些政策和规则。

vim /etc/selinux/config

SELINUX=disabled

cat /etc/selinux/config

4、关闭防火墙

service iptables stop

service ip6tables stop

5、安装相关依赖

yum -y install libtool

yum -y install unixODBC

6、安装Zookeeper

大数据高可用技术之zookeeper3.4.5安装配置

三、ClickHouse安装


官网安装部署:安装部署 | ClickHouse文档

Altinity安装部署:https://github.com/Altinity/clickhouse-rpm-install

看云安装部署:1.2ClickHouse单机安装 · ClickHouse · 看云

社区单机部署:CentOS7.5 安装 ClickHouse 20.8.3.18单机版 - clickhouseclub

社区源码部署:clickhouse 在centos7.4 编译 - clickhouseclub

社区集群部署:ClickHouse集群搭建从0到1 - clickhouseclub

win10-Docker部署:Windows下Docker安装ClickHouse - 云+社区 - 腾讯云

1、RPM在线安装

ClickHouse下载(rpm包):Index of /clickhouse/rpm/stable/x86_64/

sudo yum -y install yum-utils

sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG

sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/clickhouse.repo

sudo yum -y install clickhouse-server

sudo yum -y install clickhouse-client

2、TGZ在线安装(推荐)

ClickHouse下载(tgz包):https://repo.clickhouse.tech/tgz/stable

将ClickHouse的最新版本赋给变量LATEST_VERSION,但这里发现21.10.1.8013版本还没有提供

export LATEST_VERSION=curl https://api.github.com/repos/ClickHouse/ClickHouse/tags 2>/dev/null | grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | head -n 1

所有这里手动指定ClickHouse的版本为21.9.2.17(版本号去官网查)

生产环境建议指定最新的稳定版本,版本查看地址:Tags · ClickHouse/ClickHouse · GitHub

export LATEST_VERSION=21.9.2.17

curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-$LATEST_VERSION.tgz

curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-dbg-$LATEST_VERSION.tgz

curl -O https://repo.clickhouse.tech/tgz/clickhouse-server-$LATEST_VERSION.tgz

curl -O https://repo.clickhouse.tech/tgz/clickhouse-client-$LATEST_VERSION.tgz

tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz

sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh

tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz

sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh

tar -xzvf clickhouse-server-$LATEST_VERSION.tgz

sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh

sudo /etc/init.d/clickhouse-server start

tar -xzvf clickhouse-client-$LATEST_VERSION.tgz

sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh

解压时发现官网提供的tgz包有问题(真坑) ,只能手动下载再上传安装了

Linux tgz package clickhouse-client-21.9.2.17.tgz

Linux tgz package clickhouse-common-static-21.9.2.17.tgz

Linux tgz package clickhouse-common-static-dbg-21.9.2.17.tgz

Linux tgz package clickhouse-server-21.9.2.17.tgz

Linux tgz package clickhouse-test-21.9.2.17.tgz

tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz

sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh

tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz

sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh

tar -xzvf clickhouse-server-$LATEST_VERSION.tgz

sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh

sudo /etc/init.d/clickhouse-server start

tar -xzvf clickhouse-client-$LATEST_VERSION.tgz

sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh

3、RPM离线安装

mkdir -p /home/software/clickhouse

cd /home/software/clickhouse

wget http://repo.red-soft.biz/repos/clickhouse/stable/el7/clickhouse-client-1.1.54236-4.el7.x86_64.rpm

wget http://repo.red-soft.biz/repos/clickhouse/stable/el7/clickhouse-server-common-1.1.54236-4.el7.x86_64.rpm

wget http://repo.red-soft.biz/repos/clickhouse/stable/el7/clickhouse-compressor-1.1.54236-4.el7.x86_64.rpm

wget http://repo.red-soft.biz/repos/clickhouse/stable/el7/clickhouse-debuginfo-1.1.54236-4.el7.x86_64.rpm

wget http://repo.red-soft.biz/repos/clickhouse/stable/el7/clickhouse-server-1.1.54236-4.el7.x86_64.rpm

rpm -qa | grep clickhouse

rpm -Uvh *.rpm

rpm -qa | grep clickhouse

nohup clickhouse-server --config-file=/etc/clickhouse-server/config.xml >null 2>&1 &

四、ClickHouse命令


centos7以上的系统只有 systemctl 命令有效

1、启动clickhouse服务

sudo /etc/init.d/clickhouse-server start

service clickhouse-server start

systemctl start clickhouse-server.service

2、关闭clickhouse服务

sudo /etc/init.d/clickhouse-server stop

service clickhouse-server stop

systemctl stop clickhouse-server.service

3、重启clickhouse服务

sudo /etc/init.d/clickhouse-server restart

service clickhouse-server restart

systemctl restart clickhouse-server.service

4、查看clickhouse服务

sudo /etc/init.d/clickhouse-server status

service clickhouse-server status

systemctl status clickhouse-server.service

5、启动clickhouse客户端

clickhouse-client

6、查看clickhouse进程

ps -ef | grep clickhouse

7、停止clickhouse相关服务

ps -ef | grep clickhouse | grep -v grep | awk ‘{print $2}’ | xargs kill -9

8、查看clickhouse正常日志

tail -n 300 /var/log/clickhouse-server/clickhouse-server.log

9、查看clickhouse错误日志

tail -n 300 /var/log/clickhouse-server/clickhouse-server.err.log

10、关闭clickhouse开机自启(非生产环境)

sudo systemctl disable clickhouse-server

11、查看clickhouse集群配置

clickhouse-client -u default --password “” --query “SELECT * FROM system.clusters”

12、查看clickhouse所有表

echo “SELECT database,name,engine FROM system.tables WHERE database != ‘system’” | clickhouse-client

13、查看系统用户内存用量

ps aux | tail -n +2 | awk ‘{ printf(“%s\t%s\n”, $1, $4) }’ | clickhouse local -S “user String, memory Float64” -q “SELECT user, round(sum(memory), 2) as memoryTotal FROM table GROUP BY user ORDER BY memoryTotal DESC FORMAT Pretty”

14、查询clickhouse执行指标

echo “SELECT * FROM system.numbers LIMIT 1000” | clickhouse-benchmark -i 5 -h localhost -h localhost

15、修改clickhouse用户密码

①方法一:配置 /etc/clickhouse-server/users.xml 文件

vim /etc/clickhouse-server/users.xml

123456

②方法二:配置 /etc/clickhouse-client/config.xml 文件

vim /etc/clickhouse-client/config.xml

username

password

False

五、ClickHouse集群


ClickHouse集群部署:使用教程 | ClickHouse文档

ClickHouse集群配置:https://clickhouse.tech/docs/zh/operations/configuration-files

ClickHouse副本引擎:数据副本 | ClickHouse文档

ClickHouse分布式配置:分布 | ClickHouse文档

该集群配置为分片副本集群, ClickHouse只有 MergeTree 系列里的表可支持副本

副本配置提供高可用,分片配置提供数据的横向扩展和容灾

ClickHouse在单个节点创建表,表只会创建在单个节点上。如果想要使用复制表,

在建表时必须指定带 Replicated 前缀的复制表引擎,然后在每个节点上创建相同表

副本只能同步数据,不能同步表结构,所以我们需要在每台机器上自己手动建表

每台机器都相同的配置文件:/etc/clickhouse-server/config.xml(不引入外部metrika.xml不相同)

每台机器不相同的配置文件:/etc/metrika.xml

1、 ClickHouse目录文件介绍

ClickHouse目录文件
数据存储目录/var/lib/clickhouse
日志存储目录/var/log/clickhouse-server
默认分片集群配置/etc/metrika.xml
服务器配置文件/etc/clickhouse-server/config.xml
客户端配置文件/etc/clickhouse-client/config.xml
定时任务配置/etc/cron.d/clickhouse-server
系统服务配置文件/etc/systemd/system/clickhouse-server.service
文件句柄数量配置/etc/security/limits.d/clickhouse.conf
主程序可执行文件/var/lib/clickhouse
客户端连接可执行文件/usr/bin/clickhouse-client
服务端可执行文件/usr/bin/clickhouse-server
数据压缩可执行文件/usr/bin/clickhouse-compressor
服务器正常日志文件/var/log/clickhouse-server/clickhouse-server.log
服务端错误日志文件/var/log/clickhouse-server/clickhouse-server.err.log

2、ClickHouse集群规划

ClickHouse集群规划
zookeeperclickhouse分片副本
hadoop001shard01replica_01_02
hadoop002shard02replica_02_02
hadoop003shard03replica_03_02
hadoop004shard01replica_01_01
hadoop005shard02replica_02_01
hadoop006shard03replica_03_01

3、ClickHouse核心配置

cp /etc/clickhouse-server/config.xml /etc/clickhouse-server/config.xml.init

chmod 664 /etc/clickhouse-server/config.xml

chown -R clickhouse:clickhouse /etc/clickhouse-server

vim /etc/clickhouse-server/config.xml

<listen_host>0.0.0.0</listen_host>

/home/clickhouse/data/

<tmp_path>/home/clickhouse/tmp/</tmp_path>

<user_files_path>/home/clickhouse/data/user_files/</user_files_path>

mkdir -p /home/clickhouse/data/

mkdir -p /home/clickhouse/tmp/

mkdir -p /home/clickhouse/data/user_files/

chown -R clickhouse:clickhouse /home/clickhouse/

4、ClickHouse集群配置(rpm版本)

**注意:**集群配置在创建分布式表时可以使用{shard}和{replac}方便创建表,在建表时也可以直接自定义shard和replac变量并且不局限于集群配置的变量,可由开发者灵活定义。集群定义的元数据在zookeeper中保存,如果修改了已定义好的集群表的集群配置可能会导致表变成只读状态,这时需要去zookeeper上查看clickhouse的元数据信息是否和当前表匹配。

在rmp安装的版本中,clickhouse服务端默认配置的 /etc/clickhouse-server/config.xml 中表明

会默认加载 /etc/metrika.xml 文件作为远程服务的替换文件,这里手动配置在其它目录

其中默认的集群服务名称为标签 incl 指定的 clickhouse_remote_servers

<remote_servers incl=“clickhouse_remote_servers” />

在 /etc/clickhouse-server/config.d/ 目录下手动配置分片副本集群文件 metrika.xml

chmod 664 /etc/clickhouse-server/config.d/metrika.xml

chown clickhouse:clickhouse /etc/clickhouse-server/config.d/metrika.xml

vim /etc/clickhouse-server/config.d/metrika.xml

每台集群的配置文件都不一样,区别在于标签的和标签

详情查看上文 ClickHouse集群规划

<clickhouse_remote_servers>

<cluster_3shards_2replicas>

<internal_replication>true</internal_replication>

hadoop004

9000

hadoop001

9000

<internal_replication>true</internal_replication>

hadoop005

9000

hadoop002

9000

<internal_replication>true</internal_replication>

hadoop006

9000

hadoop003

9000

</cluster_3shards_2replicas>

</clickhouse_remote_servers>

hadoop001

2181

hadoop002

2181

hadoop003

2181

shard01

replica_01_02

::/0

<clickhouse_compression>

<min_part_size>10000000000</min_part_size>

<min_part_size_ratio>0.01</min_part_size_ratio>

lz4

</clickhouse_compression>

每台机器上只有 标签不一样,这里配置的3分片2副本的不同节点参数如下

shard01

replica_01_02

shard02

replica_02_02

shard03

replica_03_02

shard01

replica_01_01

shard02

replica_02_01

shard03

replica_03_01

sed -n ‘78, 81p’ /etc/clickhouse-server/config.d/metrika.xml

clickhouse-client -u default --password “” --query “SELECT * FROM system.clusters”

配置之后,使用过一次该集群,clickhouse会在根目录下将本机配置写入macros文件

cat /home/clickhouse/macros

img
img

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化资料的朋友,可以戳这里获取

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

shard03

replica_03_02

shard01

replica_01_01

shard02

replica_02_01

shard03

replica_03_01

sed -n ‘78, 81p’ /etc/clickhouse-server/config.d/metrika.xml

clickhouse-client -u default --password “” --query “SELECT * FROM system.clusters”

配置之后,使用过一次该集群,clickhouse会在根目录下将本机配置写入macros文件

cat /home/clickhouse/macros

[外链图片转存中…(img-viyl5EpN-1715803723815)]
[外链图片转存中…(img-sOvauzGD-1715803723815)]

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化资料的朋友,可以戳这里获取

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值