15分钟搞定OLAP查询引擎Phoenix

最新推荐文章于 2025-03-31 19:58:39 发布

秦岭熊猫

最新推荐文章于 2025-03-31 19:58:39 发布

阅读量584

点赞数

分类专栏：大数据

本文链接：https://blog.youkuaiyun.com/tianshan2010/article/details/103485018

版权

大数据专栏收录该内容

19 篇文章

订阅专栏

Apache Phoenix是一个开源的SQL接口，用于HBase，通过JDBC APIs进行数据操作。它提供二级索引支持、编译SQL为HBase扫描、服务器端的Coprocessor计算等功能，优化大数据量查询。Phoenix适用于创建表、插入数据和执行查询，不推荐用于聚合操作。可通过JDBC API、Python命令行工具或SQuirrel访问。在15分钟内，你可以快速创建表、加载数据并执行查询。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Phoenix is an open source SQL skin for HBase. You use the standard JDBC APIs instead of the regular HBase client APIs to create tables, insert data, and query your HBase data.（Phoenix是构建在HBase上的一个SQL层，能让我们用标准的JDBC APIs而不是HBase客户端APIs来创建表，插入数据和对HBase数据进行查询。Phoenix完全使用Java编写，作为HBase内嵌的JDBC驱动。Phoenix查询引擎会将SQL查询转换为一个或多个HBase扫描，并编排执行以生成标准的JDBC结果集。）

Phoenix作为应用层和HBASE之间的中间件,以下特性使它在大数据量的简单查询场景有着独有的优势。

二级索引支持(global index + local index)
编译SQL成为原生HBASE的可并行执行的scan
在数据层完成计算，server端的coprocessor执行聚合
下推where过滤条件到server端的scan filter上
利用统计信息优化、选择查询计划（5.x版本将支持CBO）
skip scan功能提高扫描速度

一般可以使用以下三种方式访问Phoenix

JDBC API
使用Python编写的命令行工具（sqlline, sqlline-thin和psql等）
SQuirrel

First, let’s create a us_population.sql file, containing a table definition:

CREATE TABLE IF NOT EXISTS us_population (
      state CHAR(2) NOT NULL,
      city VARCHAR NOT NULL,
      population BIGINT
      CONSTRAINT my_pk PRIMARY KEY (state, city));

Now let’s create a us_population.csv file containing some data to put in that table:

And finally, let’s create a us_population_queries.sql file containing a query we’d like to run on that data.

NY,New York,8143197
CA,Los Angeles,3844829
IL,Chicago,2842518
TX,Houston,2016582
PA,Philadelphia,1463281
AZ,Phoenix,1461575
TX,San Antonio,1256509
CA,San Diego,1255540
TX,Dallas,1213825
CA,San Jose,912332

SELECT state as "State",count(city) as "City Count",sum(population) as "Population Sum" FROM us_population GROUP BY state ORDER BY sum(population) DESC;

Execute the following command from a command terminal

./psql.py <your_zookeeper_quorum> us_population.sql us_population.csv us_population_queries.sql

Congratulations! You’ve just created your first Phoenix table, inserted data into it, and executed an aggregate query with just a few lines of code in 15 minutes or less!

Big deal - 10 rows! What else you got?
Ok, ok - tough crowd. Check out our bin/performance.py script to create as many rows as you want, for any schema you come up with, and run timed queries against it.

Why is it called Phoenix anyway? Did some other project crash and burn and this is the next generation?
I’m sorry, but we’re out of time and space, so we’ll have to answer that next time!

参考：http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html