介绍
官网https://www.infobright.com/
Infobright是一个基于独特的专利知识网格技术的列式数据库,能够降低您90%的管理工作量。在一台PC服务器上,Infobright企业版在对50TB甚至更多数据量进行多并发复杂查询时,能够显示出令人惊叹的速度,相比于MySQL,其查询速度提升了数倍甚至数十倍。在同类产品中,Infobright的单机性能处于领先地位
分为ICE(社区)版本和IEE(企业)版本
适合场景
我的理解是:
infobright的单机性能显著,压缩比高。适合中型数据(GB级别的查询应该是无敌的)复杂,低并发,快速的查询
ICE版本不支持DML操作,就算是支持,也不建议,因为是列式存储,事物做的不好
使用难度
可以把数据库粗略的划分成存储层和逻辑层,infobright的存储层是列式存储,逻辑层和mysql的成逻辑层一样,可以理解成一个列士存储的mysql
所以,我们使用infobright和使用mysql是一样的,包括所有DDL,命令行,连接工具,支持的语言。
使用基本无难度
优势
20:1到 40:1的高压缩比
快速一致的查询性能,即使当数据量急剧增加
存储百万兆字节和长期分析所需的历史数据
加载速度在每小时百万兆字节,提供实时查询处理或报警
平台
Windows和linux
都测试通过
ICE和IEE版本区别
相同
Column-oriented database, ideal foranalytics
都是列式存储,主要用于分析
Knowledge Grid and Infobright Optimizer
知识网格和Infobright优化器,指的是对sql的优化,适合列式查询
DomainExpert technology
指的是某个技术吧
Industry leading compression (10:1 onaverage)
行业领先的压缩(10:1平均)
Scalable to 50 TB (raw data), 30+concurrent queries (with appropriate hardware)
50TB的存储,30+的并发(根据硬件而定)
Supports Intel and AMD hardware
支持英特尔和AMD的硬件
不同
Superior query performance
IEE比ICE快50-500%
Multi-core query execution
ICE版本一次查询支持单核,加载也是支持单核,IEE版本近可能利用硬件,我用top命令看了下,我的cpu利用率一直是3%
Concurrent Query While LOAD/DML
IEE支持表级别并发DML,数据加载和LOAD数据,同时发生,ICE不支持
也就是说ICE版本,当数据库加载数据的时候,查询是需要等待的、
也就是说当加载这张表的时候,你的sql如果用到这个表,就必须等
DML support (INSERT, UPDATE, DELETE)
IEE版本支持DML操作,ICE不支持
Source code downloadable
ICE开放源码
IEE不开源码
High availability support andReplication
IEE版支持主从复制和数据备份,ICE不支持
Distributed Load Processor with Hadoopconnector supported
IEE支持分布式处理器负载,提供Hadoop的连接器,使用更简单的方法来提取数据。
以很高的速度加载HDFS的数据到Hadoop中。
Infobright Loader supported
IEE可以并行,多线程以300GB每小时的速度加载数据
同时支持ASCII和二进制文件加载。
IEE最新的4.0版本,DLP支持没小时2TB的加载数据
ICE 只支持一次装载,一个线程以300GB每小时的速度加载数据。多个装载是可以同时进行的。只支持ASCII
Native MySQL loader supported
IEE用户还可以使用本地MySQL装载机选项,提供额外的灵活性的,用于加载嵌入式函数、转换和模拟自动增量的表
ALTER TABLE / RENAME TABLE support
IEE支持修改表表结构和重名吧表
Temp table support
IEE支持这种语句
CREATE TABLE REPAYMENT_STATISTICS_1 SELECT *from REPAYMENT_STATISTICS;
ICE不支持
Product support, warranty andindemnification
IEE产品支持,保修和赔偿
License
不同的License
OS Support
IEE支持Linux and Solaris 64 bit, Windows 32/64 bit
IEC 支持Linux 32/64 bit, Windows 32/64 bit
Mysql和infobright并存
最后不要装在一起,我的测试结果是安装infobright后,使用yum安装mysql成功,但直接用rpm装是失败的
安装
ICE给的安装策略太简单了,建议结合IEE 看ICE
最好不要安装mysql
1 rpm -i infobright-4.0.7-0-x86_64-ice.rpm--prefix /usr/local
2 /usr/local/infobright/postconfig.sh 配置,如果让注册就选择N ,不报错就好了
这步一定要做,要不然/etc/my-ib.cnf都找不到
3修改my-ib.cnf配置文件
把/usr/local/infobright/data下的东西,全部靠到指定的数据目录
4 修改data目录下的brighthouse.ini
5 /etc/init.d/mysqld-ib start 启动
6 /etc/init.d/mysqld-ib stop 停止
7 mysql-ib -u root 进入
8 rpm -e infobright 卸载
9 千万别忘了把所有操作的目录都改成777,因为infobright是用mysql用户的
IEE版本安装
我用root账号操作
1 rpm -i infobright-iee_mysql-4.8.1-rhel_centos_6_64.rpm --prefix/usr/local
2 修改vi /etc/my-ib.cnf
cp -r /usr/local/infobright/data/*/data00/infobright/data/ 拷贝全部data的下的数据进入新目录
然后把rm -rf /data00/infobright/data/iee-mysql.err 让程序重新生成这个文件,要不然记录不了日志
3 mv infobright.cnf.sampleinfobright.cnf 所有参数都在这里改了,4.8以后就不用brighthouse.ini 了
然后修改这个配置文件里的ServerMainHeapSize参数
4 把infobright\ license.lic靠到datadir目录下 并且 mv infobright\ license.licinfobright.lic
5 chown -R mysql:mysql /data00/infobright/
6 chown -R mysql:mysql/usr/local/infobright;chown -R mysql:mysql /usr/local/infobright-4.8.1-x86_64/
5 /etc/init.d/mysqld-ib start 把权限都给mysql 因为使用mysql账号在操作的,所以要赋值
/usr/local/infobright/postconfig.sh 这是个操作infobright的工具
升级
没试过
创建用户
Delete FROMuser Where User='fengxian'and Host='localhost';
Delete FROM user Where User='fengxian'andHost='%';
CREATE USER 'fengxian'@'localhost'IDENTIFIED BY 'fengxian123';
GRANT ALL ON *.* TO 'fengxian'@'localhost';
grant all privileges on *.* to fengxian@'%'identified by 'fengxian123'
flush privileges;
revoke all on *.* from fengxian@'%';
revoke all on *.* from fengxian@localhost;
flush privileges;
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%'IDENTIFIED BY 'root' WITH GRANT OPTION;
密码
默认root密码是空
可以自己修改下:
/usr/local/infobright/bin/mysqladmin -uroot password 'root'
远程连接
和修改mysql是一样的
grant all privileges on *.* to root@'%' identified by 'root'
flush privileges;
端口默认是5029
数据类型
基本上能想到的都支持
Numeric Types Min Value Max Value
TINYINT -127 127
BOOL, BOOLEAN -127 127
SMALLINT -32767 32767
MEDIUMINT -8388608 8388607
INT -2147483647 2147483647
BIGINT -9223372036854775806 9223372036854775806
FLOAT -3.402823466E+38 3.402823466E+38
DOUBLE (DOUBLE PRECISION) -1.7976931348623157E+308 1.7976931348623157E+308
DEC (M,D) -(1.0E+M - 1)/(1.0E+D) (1.0E+M - 1)/(1.0E+D)
WHERE 0<M AND 0<=D<=M
Date and Time Types Min Value Max Value Format
DATE 100-01-01 9999-12-31 YYYY-mm-dd
DATETIME 100-01-01 00:00:00 9999-12-31 23:59:59 YYYY-mm-dd HH:MM:SS
TIMESTAMP 1970-01-01 00:00:00 2038-01-01 00:59:59 YYYY-mm-dd HH:MM:SS
TIME -838:59:59 838:59:59 HHH:MM:SS
YEAR (4-digit format only) 1901 2155 YYYY
String Types Max Value
CHAR(N) 255
VARCHAR(N) 65532
BINARY(N) 255
VARBINARY(N) 65532
TINYTEXT 255
TEXT(N) 65535
数据类型的效率
最高效,更好的压缩
TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT
DECIMAL
—note: storing as (10,0) will be much faster than (10,5)
DATE, TIME
一般
CHAR, VARCHAR
尽量替换
最低效
BINARY, VARBINARY
FLOAT
DOUBLE
TINYTEXT, TEXT
创建表和语法和DDL
官方是这么说的,就是和mysql的DDL是一样的,包括创建表等语句
ICE supports standard MySQL DDL, but if yourschema was for Oracle or SQL Server, there are possibly syntax or data typeswhich are not supported. For more information, see:
mysql.com
创建表语句举例
CREATE TABLE `ACTUAL_REPAYMENT` (
`ACTUAL_REPAYMENT_ID` int(11) DEFAULT NULL,
`PAY_ID` int(11) DEFAULT NULL,
`STA_ID` int(11) DEFAULT NULL,
`REPAYMENT_AMT` varchar(100) DEFAULT NULL,
`REPAYMENT_INTEREST` varchar(100) DEFAULT NULL,
`REPAYMENT_PENALTY` varchar(100) DEFAULT NULL,
`REPAYMENT_PRINCIPAL` varchar(100) DEFAULT NULL,
`REPAYMENT_BREACH` varchar(100) DEFAULT NULL,
`REPAYMENT_MANAGE` varchar(100) DEFAULT NULL,
`DEPOSIT_DONE_PAY` varchar(100) DEFAULT NULL,
`PAY_DATE` varchar(100) DEFAULT NULL,
`GUARANTEE_BASE_PAY` varchar(100) DEFAULT NULL,
`PLATFORM_ADVANCE_PAY` varchar(100) DEFAULT NULL
) ENGINE=BRIGHTHOUSE DEFAULT CHARSET=utf8;
注意:
不执行任何参照完整性检查,也不支持默认值。Infobright优化器会接受这些条款,而这些条款,在表的定义被忽略。
ENGINE=BRIGHTHOUSE 这个是mysql 中没有的,但是就算你学的是mysql中的其他ENGINE也会给你自动转化成这个,也就是说,你可以把mysql中的建表语句直接拿过来是没问题的,他帮你优化
加载数据
只支持文件加载
语法
LOAD DATA INFILE '/full_path/file_name'INTO TABLE table_name
[FIELDS
[TERMINATED BY 'char']
[ENCLOSED BY 'char']
[ESCAPED 'char']
];
TERMINATED BY 自动分割符 支持多字符 默认是“,”
--escaped-by,
enclosed 字段包围符,默认没有包围符
escaped 转义字符,默认没有转义字符
在Infobright中,enclosed 和escaped要成对使用
load data infile '/tmp/loadData/test' intotable test FIELDS TERMINATED BY ','
enclosed 举例
数据是'The TimeTraveler's Wife'
load data infile '/tmp/loadData/test' intotable test FIELDS TERMINATED BY ',' '\'';
结果是The TimeTraveler's Wife 干掉了行首和行尾的单引号
Escaped 举例
数据是'The TimeTraveler'/s Wife'
load data infile '/tmp/loadData/test' intotable test FIELDS TERMINATED BY ',' ESCAPED BY '\\';
结果是The TimeTraveler's Wife
也就是起到的作用是把单引号转义了
其实这两个参数在绝大多数数据库中都有,用于导入导出配合
导出的时候字段前后加上特殊字符,导入的时候去掉特殊字符,放在回车等出现
Set AUTOCOMMIT=1; 自动提交数据
Set AUTOCOMMIT=0; 手动提交 再来一下COMMIT;
忽略数据
默认是导入数据的时候,如果有一条错了,就挂了,就都导入不了了
我们可以设置可以忽略多少条数,超过一个定值再报错
按记录条数来
/** when the number of rows rejectedreaches 10, abort process **/
set @BH_REJECT_FILE_PATH ='/tmp/reject_file';
set @BH_ABORT_ON_COUNT = 10;
load data infile DATAFILE.csv into table T;
按记录百分比来
/** if 3% of the number of rows error, thenabort the commit process **/
set @BH_REJECT_FILE_PATH ='/tmp/reject_file';
set @BH_ABORT_ON_THRESHOLD = 0.03;
load data infile DATAFILE.csv into table T;
关闭这个功能,默认就是关闭
/** Disable the reject file feature **/
set @BH_REJECT_FILE_PATH = NULL;
set @BH_ABORT_ON_COUNT = NULL;
Lookups
其实就是把CHAR或VARCHAR 内部用一个压缩比超过10:1的证书型去替换,生成索引,这样就查询就更快了
适合
查找替换CHAR或VARCHAR值与总数与唯一值超过10:1就适合用
如果你一列100000个值,只有10000个是唯一的,那么是
Comment Lookup比较适合年龄,性别,省份这一类型的字段
用法:再列上直接加 comment 'lookup'
优化查询
1 使用高效的字段类型
2 使用优化过的函数和sql
Supported SQL via the InfobrightOptimizer
Select Statements
SELECT [ ALL | DISTINCT | DISTINCTROW ]
SELECT expr,
AS alias_name
FROM table reference
WHERE where condition
GROUP BY {col_name | expr | position}
HAVING where_condition
ORDER BY {col_name | expr | position} ASC|DESC
ORDER BY NULL
LIMIT { [ offset,] row_count | row_count OFFSET offset}
INTO OUTFILE file_name export_options
JOIN SYNTAX
– all supported
UNION SYNTAX
– all supported (except Row Subqueries)
Control Flow Functions
CASE
IF
IFNULL
NULLIF
String Functions
ASCII
BIN
BIT_LENGTH
CHAR_LENGTH
CHARACTER_LENGTH
CONCAT_WS
CONCAT
ELT
EXPORT_SET
FIELD
FIND_IN_SET
FORMAT
HEX
INSERT
INSTR
LCASE
LEFT
LENGTH
LOCATE
LOWER
LPAD
LTRIM
MAKE_SET
MID
NOT REGEXP
OCTET_LENGTH
ORD
POSITION
QUOTE
REGEXP
REPEAT
REPLACE
REVERSE
RIGHT
RLIKE
RPAD
RTRIM
SOUNDEX
SPACE
STRCMP
SUBSTR
SUBSTRING_INDEX
SUBSTRING
TRIM
UCASE
UPPER
Group By Aggregate Functions
AVG
COUNT
MIN
MAX
SUM
STDDEV
STDDEV_SAMP
STDDEV_POP
VARRIANCE
VAR_SAMP
VAR_POP
Arithmetic Operators
addition +
subtraction -
multiplication *
division /
modulo %
Comparison Functions and Operators
equal =
not equal
<>,!=
less than or equal
<=
less than
<
greater than or equal
>=
greater than
>
IS NULL
IS NOT NULL
BETWEEN ... AND
(except in join conditions)
NOT BETWEEN ... AND
IN
NOT IN
Logical Operators
NOT, !
(exceptin join conditions)
AND, &&
OR, | |
Mathematical Functions
ABS
ACOS
ASIN
ATAN2, ATAN
ATAN
CEIL
CEILING
CONV
COS
COT
DEGREES
EXP
FLOOR
LN
LOG10
LOG2
LOG
MOD
OCT
PI
POW
POWER
RADIANS
RAND
ROUND
SIGN
SIN
SQRT
TAN
TRUNCATE
Date and Time Functions
ADDDATE
+
ADDTIME
CURDATE
CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
CURTIME
DATE_ADD
DATE_FORMAT
DATE_SUB
DATE
DATEDIFF
DAY
DAYNAME
DAYOFMONTH
DAYOFWEEK
DAYOFYEAR
EXTRACT
FROM_UNIXTIME
HOUR
LOCALTIME
LOCALTIMESTAMP
MINUTE
MONTH
MONTHNAME
NOW
PERIOD_ADD
PERIOD_DIFF
QUARTER
SECOND
SUBDATE
SUBTIME
SYSDATE
TIME_FORMAT
TIME
TO_DAYS
UNIX_TIMESTAMP
WEEK
YEARWEEK
Cast Functions and Operators
CAST
CONVERT
Data Manipulation Statements
LOAD DATA INFILE
SELECT
(seedetailed list)
Data Definition Statements
CREATE DATABASE
CREATE TABLE
DROP DATABASE
DROP TABLE
3尽量不适用or,可以采用in或者union取而代之
4 UNION 比UNION ALL 慢
5 因为是列式存在,所以尽量只取必须的字段
6 Limiting Tables Used in Results 这个没看懂,有空一起看下
https://www.infobright.org/index.php/ICE_Wiki/wiki-4/working-with-ice/optimizing-queries/
备份和恢复
备份,重点是data和KNFolder 目录,我没试过
内存调优
修改/usr/local/infobright/data/brighthouse.ini在配置文件里有建议例子,如下
System | Server Main | Server Compressed | Loader Main |
Memory | Heap Size | Heap Size | Heap Size |
—————- | ———————- | ——————————— | ——————— |
8 GB | 4000 | 500 | 800 |
4 GB | 1600 | 375 | 560 |
2 GB | 600 | 256 | 320 |
ServerMainHeapSize 运行内存
ServerCompressedHeapSize 压缩内存
LoderMainHeapSize 加载内存
测试出的建议
1 不支持主键
2 创建表的时候,尽量标准,不用想hive中一样都是String,按照传统数据库设置
3 关联的键尽量不要varchar关联
测试结果
语句 | 表数据 | mysql | infobright |
select * from test.`name` t1 INNER JOIN test.age t2 on t1.u_id = t2.u_id | 6千万join 1 | 8s | 0.8s |
select * from test.`name` t1 INNER JOIN test.book t2 on t1.u_id = t2.u_id | 6千万join 2 | 12s | 1.7s |
select * from test.`name` t1 INNER JOIN test.phone t2 on t1.u_id = t2.u_id | 6千万join 100 | >5分钟 | 8s |
语句 | 表数据 | oracle | infobright |
SELECT RS.CONTRACT_ID ,BBI.S_CUSTTOMER_NAME ,str_to_date(AR.PAY_DATE, 'YYYY-MM-DD') ,AR.PAY_ID ,ARD.PERIOD ,ARD.REPAYMENT_AMT ,ARD.REPAYMENT_PRINCIPAL ,ARD.REPAYMENT_INTEREST ,ARD.REPAYMENT_PENALTY ,ARD.REPAYMENT_MANAGE FROM ACTUAL_REPAYMENT_DETAIL ARD LEFT JOIN ACTUAL_REPAYMENT AR ON AR.ACTUAL_REPAYMENT_ID = ARD.ACTUAL_REPAYMENT_ID LEFT JOIN REPAYMENT_STATISTICS RS ON RS.STA_ID = AR.STA_ID LEFT JOIN B_APPLY_INFO BAI ON BAI.I_APPLY_ID = RS.APPLY_ID LEFT JOIN B_BASIC_INFO BBI ON BBI.I_BORROW_ID = BAI.I_BORROW_ID WHERE ARD.REPAYMENT_DETAIL_ID IN ( SELECT DISTINCT ARD.REPAYMENT_DETAIL_ID FROM ACTUAL_REPAYMENT_DETAIL_ALLOT ARDA LEFT JOIN ACTUAL_REPAYMENT_DETAIL ARD ON ARD.REPAYMENT_DETAIL_ID = ARDA.REPAYMENT_DETAIL_ID LEFT JOIN ACTUAL_REPAYMENT AR ON AR.ACTUAL_REPAYMENT_ID = ARD.ACTUAL_REPAYMENT_ID WHERE ARDA.LENDER_USER_ID = 3 ) ORDER BY RS.CONTRACT_ID,ARD.PERIOD; | 609742 642132 18439478 601895 2141894 114709 | 212s | 12s |
Knowledge Grid Statistics
基本就是infobright的核心架构了 查询优化方式,我没看太懂
http://www.dataguru.cn/article-6815-1.html
https://www.infobright.org/index.php/ICE_Wiki/wiki-4/working-with-ice/knowledge-grid-statistics/