目录
一、部署简介
总共3台机器,也可以将数据库单独部署
还有一台ansible主机,我这里用的是自己的笔记本, ansible安装略
Inventory
[cloudera_manager]
192.168.31.102 hostname=cdh01
[cloudera_agent]
192.168.31.102 hostname=cdh01
192.168.31.103 hostname=cdh02
192.168.31.104 hostname=cdh03
[db_mysql]
192.168.31.102 hostname=cdh01
[all:children]
cloudera_manager
cloudera_agent
db_mysql
[all:vars]
ansible_connection=ssh
ansible_ssh_user=ubuntu
ansible_ssh_pass='123456'
ansible_sudo_pass='123456'
大致步骤如下:
1、系统初始化及一些基本配置
2、安装数据库,并对数据库进行配置,创建cloudera manager需要的一些数据库
3、架设离线安装源
4、安装cloudera manager
5、安装cloudera agent
6、打开CDH管理页面,创建集群
二、角色介绍
ansible 的几个role 介绍
角色的目录结构可以由 ansible-galaxy 自动生成,如:
ansible-galaxy init common
common
角色 common 的目录结构如下
erniudeMacBook-Pro:csdn erniu$ tree common
common
├── README.md
├── defaults
│ └── main.yml
├── files
│ ├── disable-thp.service
│ └── history.sh
├── handlers
│ └── main.yml
├── meta
│ └── main.yml
├── tasks
│ └── main.yml
├── templates
├── tests
│ ├── inventory
│ └── test.yml
└── vars
└── main.yml
其中主要任务文件 tasks/main.yml 如下
---
- name: Set time info in history command
copy:
src: history.sh
dest: /etc/profile.d/
mode: '0644'
- name: Stop ufw
ufw:
state: disabled
- name: Set hostname
hostname: name={{ hostname }}
- name: Set timezone to Shanghai
timezone:
name: Asia/Shanghai
- name: Disable Transparent Huge Pages
block:
- name: Upload disable-thp.service file
copy:
src: disable-thp.service
dest: /etc/systemd/system/
- name: Add disable-thp to system
systemd:
name: disable-thp
state: started
enabled: yes
- name: Disable swap
sysctl:
name: vm.swappiness
value: 0
- name: Apt update
apt:
update_cache: true
- name: Install common utils
apt:
name: "{{ packages }}"
state: present
vars:
packages:
- ntp
- language-pack-zh-hans
- language-pack-zh-hans-base
files/history.sh
export HISTTIMEFORMAT="[%F %T] "
files/disable-thp.service
[Unit]
Description=Disable Transparent Huge Pages (THP)
[Service]
Type=simple
ExecStart=/bin/sh -c "echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled && echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag"
[Install]
WantedBy=multi-user.target
还需要一个 role 的控制文件, 和roles目录平级
common.yml
---
- name: Init OS
hosts: '{{ hosts }}'
gather_facts: true
become: true
become_method: sudo
roles:
- common
mysql
角色 mysql 的目录结构:
erniudeMacBook-Pro:roles erniu$ tree mysql
mysql
├── README.md
├── defaults
│ └── main.yml
├── files
│ └── my.cnf
├── handlers
│ └── main.yml
├── meta
│ └── main.yml
├── tasks
│ ├── configure.yml
│ ├── install.yml
│ └── main.yml
├── templates
├── tests
│ ├── inventory
│ └── test.yml
└── vars
└── main.yml
tasks/main.yml
---
- include_tasks: install.yml
- include_tasks: configure.yml
tasks/install.yml
---
- name: Install Mysql
apt:
name: "{{ item }}"
with_items:
- python3-mysqldb
- mysql-server
- mysql-client
- libmysqlclient-dev
- name: Start the MySQL service
systemd:
name: mysql
state: started
enabled: true
tasks/configure.yml
---
- name: Stop mysql
systemd:
name: mysql
state: stopped
# 以下几步是把默认的一些文件,转移到另外的目录
- name: Create a backup dir
file:
path: /opt/backup/mysql
state: directory
- name: Copy old InnoDB logfile0
copy:
src: /var/lib/mysql/ib_logfile0
dest: /opt/backup/mysql/
remote_src: yes
force: no
- name: Copy old InnoDB logfile1
copy:
src: /var/lib/mysql/ib_logfile1
dest: /opt/backup/mysql/
remote_src: yes
force: no
- name: Remove old InnoDB logfile0
file:
path: /var/lib/mysql/ib_logfile0
state: absent
- name: Remove old InnoDB logfile1
file:
path: /var/lib/mysql/ib_logfile1
state: absent
# 上传自定义配置文件,可以使用 cdh 官方推荐的配置
# 这里有个 notify,所以需要一个 handler
- name: Upload my.cnf
copy:
src: my.cnf
dest: /etc/my.cnf
mode: 0644
owner: root
group: root
notify: Restart mysql
handlers/main.yml
---
- name: Restart mysql
systemd:
name: mysql
state: restarted
files/my.cnf
其中官方强调,explicit_defaults_for_timestamp=0 一定要设置为0 ,否则会报错
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
transaction-isolation = READ-COMMITTED
character-set-server = utf8
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
symbolic-links = 0
key_buffer_size = 32M
max_allowed_packet = 16M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1
max_connections = 600
#expire_logs_days = 10
#max_binlog_size = 100M
#log_bin should be on a disk with enough free space.
#Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your
#system and chown the specified folder to the mysql user.
log_bin=/var/lib/mysql/mysql_binary_log
#In later versions of MySQL, if you enable the binary log and do not set
#a server_id, MySQL will not start. The server_id must be unique within
#the replicating group.
server_id=1
binlog_format = mixed
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
sql_mode=STRICT_ALL_TABLES
# cloudera
explicit_defaults_for_timestamp=0
同样,我们还需要一个调用 mysql 角色的 mysql.yml 文件,和 roles 目录平级
mysql.yml
---
- name: Install mysql
hosts: '{{ hosts }}'
gather_facts: true
become: true
become_method: sudo
roles:
- mysql
clouderamanager
clouderamanager 角色的目录结构
erniudeMacBook-Pro:roles erniu$ tree -L 3 clouderamanager/
clouderamanager/
├── README.md
├── defaults
│ └── main.yml
├── files
│ ├── cdh6
│ │ └── 6.3.2
│ └── cm6
│ └── 6.3.1
├── handlers
│ └── main.yml
├── meta
│ └── main.yml
├── tasks
│ └── main.yml
├── templates
├── tests
│ ├── inventory
│ └── test.yml
└── vars
└── main.yml
这里files目录下的cdh6和cm6源文件,官方已经不支持下载了,需要的朋友可以加我微信
tasks/main.yml
---
- name: Add hosts dns
blockinfile:
path: /etc/hosts
block: |
{{ item.ip }} {{ item.name }}
marker: "# {mark} ANSIBLE MANAGED BLOCK {{ item.name }}"
loop:
- { name: cdh01, ip: 172.16.33.101 }
- { name: chd02, ip: 172.16.33.102 }
- { name: cdh03, ip: 172.16.1.18 }
- name: Install Apache
apt: name=apache2 update_cache=yes state=latest
- name: Start Apache
systemd:
name: apache2
state: started
enabled: yes
- name: Upload server ntp.conf
copy:
src: ntp.conf
dest: /etc/
mode: '0644'
- name: Start ntp service
service:
name: ntp
state: restarted
enabled: yes
- name: Upload cdh files
copy:
src: cdh6
dest: /var/www/html/
- name: Upload cloudear manager files
copy:
src: cm6
dest: /var/www/html/
- name: Install createrepo
apt:
name: createrepo
state: present
- name: Create cm repo
command:
cmd: createrepo .
chdir: /var/www/html/cm6
- name: Add cm yum repo
copy:
src: cm.list
dest: /etc/apt/sources.list.d/
mode: '0644'
- name: Add archive.key
apt_key:
url: http://cdh01/cm6/6.3.1/ubuntu1804/apt/archive.key
state: present
- name: Apt update
apt:
update_cache: true
- name: Install oracle-j2sdk1.8
apt:
name: oracle-j2sdk1.8
state: present
- name: Install cloudera-manager server
apt:
name:
- cloudera-manager-server
# - cloudera-manager-server-db-2
state: present
- name: Install libmysql-java
apt:
name: libmysql-java
state: present
- name: Start cloudera manager
service:
name: cloudera-scm-server
state: restarted
enabled: yes
这里的hosts配置需要改成自己的
files/cm.list
deb [arch=amd64] http://cdh01/cm6/6.3.1/ubuntu1804/apt bionic-cm6.3.1 contrib
调用clouderamanager角色的 clouderamanager.yml
---
- name: Install clouderamanager
hosts: '{{ hosts }}'
gather_facts: true
become: true
become_method: sudo
roles:
- clouderamanager
clouderaagent
clouderagent 角色的目录结构
erniudeMacBook-Pro:roles erniu$ tree -L 3 clouderaagent/
clouderaagent/
├── README.md
├── defaults
│ └── main.yml
├── files
│ └── cm.list
├── handlers
│ └── main.yml
├── meta
│ └── main.yml
├── tasks
│ └── main.yml
├── templates
├── tests
│ ├── inventory
│ └── test.yml
└── vars
└── main.yml
tasks/main.yml
---
- name: Add hosts dns
blockinfile:
path: /etc/hosts
block: |
{{ item.ip }} {{ item.name }}
marker: "# {mark} ANSIBLE MANAGED BLOCK {{ item.name }}"
loop:
- { name: cdh01, ip: 172.16.33.101 }
- { name: chd02, ip: 172.16.33.102 }
- { name: cdh03, ip: 172.16.1.18 }
- name: Add cm yum repo
copy:
src: cm.list
dest: /etc/apt/sources.list.d/
mode: '0644'
- name: Add archive.key
apt_key:
url: http://cdh01/cm6/6.3.1/ubuntu1804/apt/archive.key
state: present
- name: Upload client ntp.conf
copy:
src: ntp.conf
dest: /etc/
mode: '0644'
- name: Start ntp service
service:
name: ntp
state: restarted
enabled: yes
- name: Apt update
apt:
update_cache: true
- name: Install oracle-j2sdk1.8
apt:
name: oracle-j2sdk1.8
state: present
- name: Install packages
apt:
name: "{{packages}}"
vars:
packages:
- libxslt-dev
- libmysql-java
- name: Install cloudera-manager agent
apt:
name:
- cloudera-manager-daemons
- cloudera-manager-agent
state: present
- name: Change cloudera agent config
lineinfile:
path: /etc/cloudera-scm-agent/config.ini
regexp: '^server_host='
line: server_host=cdh01
- name: Start cloudera agent
service:
name: cloudera-scm-agent
state: restarted
enabled: yes
三、部署服务
部署 common
common是基础设置,所以对所有机器都要执行
ansible-playbook -i host common.yml -e "hosts=all"
部署 mysql
仅对mysql服务器部署
ansible-playbook -i host mysql.yml -e "hosts=db_mysql"
配置mysql
远程连接数据库服务器,sudo mysql 即可登录数据库
设置 root 密码:
grant all privileges on *.* to root@'localhost' identified by '123456';
创建数据库和用户,按需选择
# Cloudera Manager Server
CREATE DATABASE scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY '123456';
# Activity Monitor
CREATE DATABASE amon DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON amon.* TO 'amon'@'%' IDENTIFIED BY '123456';
# Reports Manager
CREATE DATABASE rman DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON rman.* TO 'rman'@'%' IDENTIFIED BY '123456';
# Hue
CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON hue.* TO 'hue'@'%' IDENTIFIED BY '123456';
# Hive Metastore Server
CREATE DATABASE metastore DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON metastore.* TO 'metastore'@'%' IDENTIFIED BY '123456';
# Sentry Server
CREATE DATABASE sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON sentry.* TO 'sentry'@'%' IDENTIFIED BY '123456';
# Oozie
CREATE DATABASE oozie DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON oozie.* TO 'oozie'@'%' IDENTIFIED BY '123456';
部署clouderamanager
ansible-playbook -i host clouderamanager.yml -e "hosts=cloudera_manager"
查看 clouderamanager 状态
systemctl status cloudera-scm-server
初始化数据库:
sudo /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm 123456
部署clouderaagent
ansible-playbook -i host clouderaagent.yml -e "hosts=cloudera_agent"
查看 clouderaagent状态
systemctl status cloudera-agent-scm-agent
在cdh01上
创建parcel-repo并重启cloudera-scm-server
sudo cp /var/www/html/cdh6/6.3.2/parcels/* /opt/cloudera/parcel-repo/
cd /opt/cloudera/parcel-repo/
sudo mv CDH-6.3.2-1.cdh6.3.2.p0.1605554-bionic.parcel.sha1 CDH-6.3.2-1.cdh6.3.2.p0.1605554-bionic.parcel.sha
sudo systemctl restart cloudera-scm-server
四、创建集群
浏览器登录 http://172.16.33.101:7180/
默认用户名密码 admin admin
按提示操作,这里选择免费版
点击继续后,过一会儿会跳到集群创建的步骤
设置集群名称
这里已经识别到了3台主机,全部勾选,并点击继续
已经配置了parcel源
安装 Parcels
这一步完成之后,登录cdh01机器,修改hive-site.xml, 使其支持utf8字符集
文件路径:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/etc/hive/conf.dist/hive-site.xml
添加 &useUnicode=true&characterEncoding=UTF-8
<value>jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true&useUnicode=true&characterEncoding=UTF-8</value>
(网上说改这个配置文件可以让hive创建表的时候支持中文字符,但实际上,hive创建的初始表,还是用的latin1字符集
)
选择接受
选择自定义服务,勾选你要安装的服务
下面是我的选择,也可以不选,后面再按需一个个添加
自定义角色分配,按自己的分配
数据库设置
这里的主机名称是你数据库所在的主机名,数据库名称和用户名密码,是刚才创建的
审核更改
如果需要改配置,可以在这一步更改
部署完成
这样集群就创建好了
附:
上文通过修改hive-site.xml实现hive元数据库支持中文字符并不生效,所以需要手动修改
### 修改表字符集
alter table COLUMNS_V2 default character set utf8;
alter table TABLE_PARAMS default character set utf8;
alter table PARTITION_PARAMS default character set utf8;
alter table PARTITION_KEYS default character set utf8;
alter table INDEX_PARAMS default character set utf8;
alter table TBLS default character set utf8;
### 修改表字段字符集
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table COLUMNS_V2 modify column COLUMN_NAME varchar(767) character set utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(20000) character set utf8;
alter table TABLE_PARAMS modify column PARAM_KEY varchar(256) character set utf8;
alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(20000) character set utf8;
alter table PARTITION_PARAMS modify column PARAM_KEY varchar(256) character set utf8;
alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(20000) character set utf8;
alter table PARTITION_KEYS modify column PKEY_NAME varchar(128) character set utf8;
alter table PARTITION_KEYS modify column PKEY_TYPE varchar(767) character set utf8;
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table INDEX_PARAMS modify column PARAM_KEY varchar(256) character set utf8;
ALTER TABLE TBLS modify COLUMN VIEW_EXPANDED_TEXT mediumtext CHARACTER SET utf8;
ALTER TABLE TBLS modify COLUMN VIEW_ORIGINAL_TEXT mediumtext CHARACTER SET utf8;
ALTER TABLE TBLS modify COLUMN OWNER varchar(767) CHARACTER SET utf8;
ALTER TABLE TBLS modify COLUMN OWNER_TYPE varchar(10) CHARACTER SET utf8;
ALTER TABLE TBLS modify COLUMN TBL_NAME varchar(256) CHARACTER SET utf8;
ALTER TABLE TBLS modify COLUMN TBL_TYPE varchar(128) CHARACTER SET utf8;