Presto: Interacting with petabytes of data at Facebook

Presto是一款由Facebook开发的分布式SQL查询引擎,专为大规模数据集上的交互式分析而优化。相较于Hive/MapReduce,Presto在CPU效率和延迟方面提升了10倍,并支持标准ANSI SQL语法,包括复杂查询、聚合、连接和窗口函数等功能。目前,Presto已成功部署并广泛应用于Facebook内部。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

By Martin Traverso


Background

Facebook is a data-driven company. Data processing and analytics are at the heart of building and delivering products for the 1 billion+active users of Facebook. We have one of the largest data warehouses in the world, storing more than 300 petabytes. The data is used for a wide range of applications, from traditional batch processing to graph analytics [1], machine learning, and real-time interactive analytics.


For the analysts, data scientists, and engineers who crunch data, derive insights, and work to continuously improve our products, the performance of queries against our data warehouse is important. Being able to run more queries and get results faster improves their productivity.


Facebook’s warehouse data is stored in a few large Hadoop/HDFS-based clusters. Hadoop MapReduce [2] and Hive are designed for large-scale, reliable computation, and are optimized for overall system throughput. But as our warehouse grew to petabyte scale and our needs evolved, it became clear that we needed an interactive system optimized for low query latency.


In Fall 2012, a small team in the Facebook Data Infrastructure group set out to solve this problem for our warehouse users. We evaluated a few external projects, but they were either too nascent or did not meet our requirements for flexibility and scale. So we decided to build Presto, a new interactive query system that could operate fast at petabyte scale.


In this post, we will briefly describe the architecture of Presto, its current status, and future roadmap.


Architecture

Presto is a distributed SQL query engine optimized for ad-hoc analysis at interactive speed. It supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions. 


The diagram below shows the simplified system architecture of Presto. The client sends SQL to the Presto coordinator. The coordinator parses, analyzes, and plans the query execution. The scheduler wires together the execution pipeline, assigns work to nodes closest to the data, and monitors progress. The client pulls data from output stage, which in turn pulls data from underlying stages.


The execution model of Presto is fundamentally different from Hive/MapReduce. Hive translates queries into multiple stages of MapReduce tasks that execute one after another. Each task reads inputs from disk and writes intermediate output back to disk. In contrast, the Presto engine does not use MapReduce. It employs a custom query and execution engine with operators designed to support SQL semantics. In addition to improved scheduling, all processing is in memory and pipelined across the network between stages. This avoids unnecessary I/O and associated latency overhead. The pipelined execution model runs multiple stages at once, and streams data from one stage to the next as it becomes available. This significantly reduces end-to-end latency for many types of queries.




The Presto system is implemented in Java because it’s fast to develop, has a great ecosystem, and is easy to integrate with the rest of the data infrastructure components at Facebook that are primarily built in Java. Presto dynamically compiles certain portions of the query plan down to byte code which lets the JVM optimize and generate native machine code. Through careful use of memory and data structures, Presto avoids typical issues of Java code related to memory allocation and garbage collection. (In a later post, we will share some tips and tricks for writing high-performance Java system code and the lessons learned while building Presto.)


Extensibility is another key design point for Presto. During the initial phase of the project, we realized that large data sets were being stored in many other systems in addition to HDFS. Some data stores are well-known systems such as HBase, but others are custom systems such as the Facebook News Feed backend. Presto was designed with a simple storage abstraction that makes it easy to provide SQL query capability against these disparate data sources. Storage plugins (called connectors) only need to provide interfaces for fetching metadata, getting data locations, and accessing the data itself. In addition to the primary Hive/HDFS backend, we have built Presto connectors to several other systems, including HBase, Scribe, and other custom systems.




Current status

As mentioned above, development on Presto started in Fall 2012. We had our first production system up and running in early 2013. It was fully rolled out to the entire company by Spring 2013. Since then, Presto has become a major interactive system for the company’s data warehouse. It is deployed in multiple geographical regions and we have successfully scaled a single cluster to 1,000 nodes. The system is actively used by over a thousand employees, who run more than 30,000 queries processing one petabyte daily. 


Presto is 10x better than Hive/MapReduce in terms of CPU efficiency and latency for most queries at Facebook. It currently supports a large subset of ANSI SQL, including joins, left/right outer joins, subqueries, and most of the common aggregate and scalar functions, including approximate distinct counts (using HyperLogLog) and approximate percentiles (based on quantile digest). The main restrictions at this stage are a size limitation on the join tables and cardinality of unique keys/groups. The system also lacks the ability to write output data back to tables (currently query results are streamed to the client).


Roadmap

We are actively working on extending Presto functionality and improving performance.  In the next few months, we will remove restrictions on join and aggregation sizes and introduce the ability to write output tables.  We are also working on a query “accelerator” by designing a new data format that is optimized for query processing and avoids unnecessary transformations. This feature will allow hot subsets of data to be cached from backend data store, and the system will transparently use cached data to “accelerate” queries.  We are also working on a high performance HBase connector.


Open source

After our initial Presto announcement at the Analytics @ WebScale conference in June 2013 [3], there has been a lot of interest from the external community. In the last couple of months, we have released Presto code and binaries to a small number of external companies. They have successfully deployed and tested it within their environments and given us great feedback.


Today we are very happy to announce that we are open-sourcing Presto. You can check out the code and documentation on the site below. We look forward to hearing about your use cases and how Presto can help with your interactive analysis.


http://prestodb.io/

https://github.com/facebook/presto


The Presto team within Facebook Data Infrastructure consists of Martin Traverso, Dain Sundstrom, David Phillips, Eric Hwang, Nileema Shingte and Ravi Murthy.


Links

[1] Scaling Apache Giraph to a trillion edges.https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920

[2] Under the hood: Scheduling MapReduce jobs more efficiently with Coronahttps://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920

[3] Video of Presto talk at Analytics@Webscale conference, June 2013https://www.facebook.com/photo.php?v=10202463462128185


标题基于SpringBoot+Vue的学生交流互助平台研究AI更换标题第1章引言介绍学生交流互助平台的研究背景、意义、现状、方法与创新点。1.1研究背景与意义分析学生交流互助平台在当前教育环境下的需求及其重要性。1.2国内外研究现状综述国内外在学生交流互助平台方面的研究进展与实践应用。1.3研究方法与创新点概述本研究采用的方法论、技术路线及预期的创新成果。第2章相关理论阐述SpringBoot与Vue框架的理论基础及在学生交流互助平台中的应用。2.1SpringBoot框架概述介绍SpringBoot框架的核心思想、特点及优势。2.2Vue框架概述阐述Vue框架的基本原理、组件化开发思想及与前端的交互机制。2.3SpringBoot与Vue的整合应用探讨SpringBoot与Vue在学生交流互助平台中的整合方式及优势。第3章平台需求分析深入分析学生交流互助平台的功能需求、非功能需求及用户体验要求。3.1功能需求分析详细阐述平台的各项功能需求,如用户管理、信息交流、互助学习等。3.2非功能需求分析对平台的性能、安全性、可扩展性等非功能需求进行分析。3.3用户体验要求从用户角度出发,提出平台在易用性、美观性等方面的要求。第4章平台设计与实现具体描述学生交流互助平台的架构设计、功能实现及前后端交互细节。4.1平台架构设计给出平台的整体架构设计,包括前后端分离、微服务架构等思想的应用。4.2功能模块实现详细阐述各个功能模块的实现过程,如用户登录注册、信息发布与查看、在线交流等。4.3前后端交互细节介绍前后端数据交互的方式、接口设计及数据传输过程中的安全问题。第5章平台测试与优化对平台进行全面的测试,发现并解决潜在问题,同时进行优化以提高性能。5.1测试环境与方案介绍测试环境的搭建及所采用的测试方案,包括单元测试、集成测试等。5.2测试结果分析对测试结果进行详细分析,找出问题的根源并
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值