Prestodb概述及性能测试

最新推荐文章于 2025-08-28 11:33:29 发布

原创最新推荐文章于 2025-08-28 11:33:29 发布 · 751 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#大数据 #json #java

presto系列专栏收录该内容

2 篇文章

订阅专栏

本文对比了Hive和Presto在功能实现、数据类型支持、DDL/DML语句、增强聚合、窗口函数、UDF、查询时间等方面的特点，并通过测试环境展示了两者在不同操作上的性能差异。

概述内容

（1）简介

（2）Hive and Prestodb, comparison of functionality

（3）Hive and Prestodb, comparison of performance

（1）简介

Presto是由facebook开发的一个分布式SQL查询引擎，它被设计为用来专门进行高速、实时的数据分析。它支持标准的ANSI SQL，包括复杂查询、聚合（aggregation）、连接（join）和窗口函数（window functions)。

Presto框架图如下：

下面的架构图中展现了简化的Presto系统架构。客户端（client）将SQL查询发送到Presto的协调员（coordinator）。协调员会进行语法检查、分析和规划查询计划。计划员（scheduler）将执行的管道组合在一起，将任务分配给那些里数据最近的节点，然后监控执行过程。客户端从输出段中将数据取出，这些数据是从更底层的处理段中依次取出的。

Presto的运行模型和Hive或MapReduce有着本质的区别。Hive将查询翻译成多阶段的MapReduce任务，一个接着一个地运行。每一个任务从磁盘上读取输入数据并且将中间结果输出到磁盘上。然而Presto引擎没有使用MapReduce。它使用了一个定制的查询和执行引擎和响应的操作符来支持SQL的语法。除了改进的调度算法之外，所有的数据处理都是在内存中进行的。不同的处理端通过网络组成处理的流水线。这样会避免不必要的磁盘读写和额外的延迟。这种流水线式的执行模型会在同一时间运行多个数据处理段，一旦数据可用的时候就会将数据从一个处理段传入到下一个处理段。这样的方式会大大的减少各种查询的端到端响应时间。

（2）Hive and Prestodb, comparison of functionality

√: Yes; ×: No; Blue: The main differences between hive and presto

	hive 0.11.0	presto 0.56
Implement	Java	Java
DataType
integer	√	√
string	√	√
floating point	√	√
boolean	√	√
map	√	√
list	√	√
struct	√	√
uniontype	√	×
timestamp	√	√
DDL(数据定义语言)
create/alter/drop table	√	×
create view	√	×
truncate table	√	×
desc	√	√
create index	√	×
DML(数据操作语言)
load data	√	×
insert	√	√
explain	√	√
tablesample(基于column做bucket)	√	√
group by	√	√
order by	√	√
having	√	√
limit	√	√
inner/left/right/full join	√	√
union	√	√
sub queries	√	√
Enhanced Aggregation, Cube, Grouping and Rollup	√	×
lateral view	√	×
Function
UDF	√	×
Mathematical Functions	√	√
String Functions	√	√
Date and Time Functions	√	√
Regex	√	√
Type Conversion Functions	√	×
Conditional Functions	√	√
Aggregate Functions	√	√
Windowing	√	√
Distinct	√	√
Url	√	√
Json	√	√