15分钟搞定OLAP查询引擎Phoenix

Apache Phoenix是一个开源的SQL接口,用于HBase,通过JDBC APIs进行数据操作。它提供二级索引支持、编译SQL为HBase扫描、服务器端的Coprocessor计算等功能,优化大数据量查询。Phoenix适用于创建表、插入数据和执行查询,不推荐用于聚合操作。可通过JDBC API、Python命令行工具或SQuirrel访问。在15分钟内,你可以快速创建表、加载数据并执行查询。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Phoenix is an open source SQL skin for HBase. You use the standard JDBC APIs instead of the regular HBase client APIs to create tables, insert data, and query your HBase data.(Phoenix是构建在HBase上的一个SQL层,能让我们用标准的JDBC APIs而不是HBase客户端APIs来创建表,插入数据和对HBase数据进行查询。Phoenix完全使用Java编写,作为HBase内嵌的JDBC驱动。Phoenix查询引擎会将SQL查询转换为一个或多个HBase扫描,并编排执行以生成标准的JDBC结果集。)

Phoenix作为应用层和HBASE之间的中间件,以下特性使它在大数据量的简单查询场景有着独有的优势。

  • 二级索引支持(global index + local index)

  • 编译SQL成为原生HBASE的可并行执行的scan

  • 在数据层完成计算,server端的coprocessor执行聚合

  • 下推where过滤条件到server端的scan filter上

  • 利用统计信息优化、选择查询计划(5.x版本将支持CBO)

  • skip scan功能提高扫描速度

一般可以使用以下三种方式访问Phoenix

  1. JDBC API

  2. 使用Python编写的命令行工具(sqlline, sqlline-thin和psql等)

  3. SQuirrel

  • First, let’s create a us_population.sql file, containing a table definition:
  • CREATE TABLE IF NOT EXISTS us_population (
          state CHAR(2) NOT NULL,
          city VARCHAR NOT NULL,
          population BIGINT
          CONSTRAINT my_pk PRIMARY KEY (state, city));

    Now let’s create a us_population.csv file containing some data to put in that table:

  • And finally, let’s create a us_population_queries.sql file containing a query we’d like to run on that data.
  • NY,New York,8143197
    CA,Los Angeles,3844829
    IL,Chicago,2842518
    TX,Houston,2016582
    PA,Philadelphia,1463281
    AZ,Phoenix,1461575
    TX,San Antonio,1256509
    CA,San Diego,1255540
    TX,Dallas,1213825
    CA,San Jose,912332

    SELECT state as "State",count(city) as "City Count",sum(population) as "Population Sum" FROM us_population GROUP BY state ORDER BY sum(population) DESC;

  • Execute the following command from a command terminal
./psql.py <your_zookeeper_quorum> us_population.sql us_population.csv us_population_queries.sql

Congratulations! You’ve just created your first Phoenix table, inserted data into it, and executed an aggregate query with just a few lines of code in 15 minutes or less!

Big deal - 10 rows! What else you got?
Ok, ok - tough crowd. Check out our bin/performance.py script to create as many rows as you want, for any schema you come up with, and run timed queries against it.

Why is it called Phoenix anyway? Did some other project crash and burn and this is the next generation?
I’m sorry, but we’re out of time and space, so we’ll have to answer that next time!

参考:http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html

注意:

1.phoenix适合修改数据
2.phoenix适合过滤、查询、join操作
3.phoenix不适合聚合操作
4.合理地设置rowkey 提升性能
5.自动按照rowkey的升序进行存储

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值