Phoenix is an open source SQL skin for HBase. You use the standard JDBC APIs instead of the regular HBase client APIs to create tables, insert data, and query your HBase data.(Phoenix是构建在HBase上的一个SQL层,能让我们用标准的JDBC APIs而不是HBase客户端APIs来创建表,插入数据和对HBase数据进行查询。Phoenix完全使用Java编写,作为HBase内嵌的JDBC驱动。Phoenix查询引擎会将SQL查询转换为一个或多个HBase扫描,并编排执行以生成标准的JDBC结果集。)
Phoenix作为应用层和HBASE之间的中间件,以下特性使它在大数据量的简单查询场景有着独有的优势。
-
二级索引支持(global index + local index)
-
编译SQL成为原生HBASE的可并行执行的scan
-
在数据层完成计算,server端的coprocessor执行聚合
-
下推where过滤条件到server端的scan filter上
-
利用统计信息优化、选择查询计划(5.x版本将支持CBO)
-
skip scan功能提高扫描速度
一般可以使用以下三种方式访问Phoenix
-
JDBC API
-
使用Python编写的命令行工具(sqlline, sqlline-thin和psql等)
-
SQuirrel
- First, let’s create a us_population.sql file, containing a table definition:
-
CREATE TABLE IF NOT EXISTS us_population ( state CHAR(2) NOT NULL, city VARCHAR NOT NULL, population BIGINT CONSTRAINT my_pk PRIMARY KEY (state, city));
Now let’s create a us_population.csv file containing some data to put in that table:
- And finally, let’s create a us_population_queries.sql file containing a query we’d like to run on that data.
-
NY,New York,8143197 CA,Los Angeles,3844829 IL,Chicago,2842518 TX,Houston,2016582 PA,Philadelphia,1463281 AZ,Phoenix,1461575 TX,San Antonio,1256509 CA,San Diego,1255540 TX,Dallas,1213825 CA,San Jose,912332
SELECT state as "State",count(city) as "City Count",sum(population) as "Population Sum" FROM us_population GROUP BY state ORDER BY sum(population) DESC;
- Execute the following command from a command terminal
./psql.py <your_zookeeper_quorum> us_population.sql us_population.csv us_population_queries.sql
Congratulations! You’ve just created your first Phoenix table, inserted data into it, and executed an aggregate query with just a few lines of code in 15 minutes or less!
Big deal - 10 rows! What else you got?
Ok, ok - tough crowd. Check out our bin/performance.py script to create as many rows as you want, for any schema you come up with, and run timed queries against it.
Why is it called Phoenix anyway? Did some other project crash and burn and this is the next generation?
I’m sorry, but we’re out of time and space, so we’ll have to answer that next time!
参考:http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html
注意:
1.phoenix适合修改数据
2.phoenix适合过滤、查询、join操作
3.phoenix不适合聚合操作
4.合理地设置rowkey 提升性能
5.自动按照rowkey的升序进行存储