Connectors
Connectors是Presto queries的’数据源’,即便查询数据源中没有table表,只要实现Presto所需的API,也可以查询数据。
ConnectorFactory
plugin调用getConnectorFactory()–>得到ConnectorFactory–>建立Connector实例:
ConnectorMetadata
ConnectorSplitManager
ConnectorHandleResolver
ConnectorRecordSetProvider
ConnectorMetadata
为Presto提供大量方法列举特定数据的schemas列表,tables列表, columns列表, 和其他元数据metadata,例程实现可参见: Example HTTP Connector和Cassandra connector
ConnectorSplitManger
SplitManger将table数据分块chunks,供Presto向workers分发处理。
例如:
- Hive connector 列举hive的每个partition的文件files,为每个文件建立一到多个split。
- 对于没有分区(partitioned)的数据,一个相对好的策略是整个表作为一个split(Example HTTP connector 即为该类实现)
ConnectorRecordSetProvider
给定一个split和一组columns列表,RecordSetProvider负责将数据传输到Presto的执行引擎execution engine。
它会构建一个RecordSet记录集,继而建立RecordCursor游标,供Presto行行读取各行中的列值(类似于JDBC)
Example HTTP Connector
example.properties
presto-main/etc/catalog/example.properties
connector.name=example-http
metadata-uri=http://s3.amazonaws.com/presto-example/v2/example-metadata.json
查询示例
presto:default> show schemas from example;
Schema
--------------------
example
information_schema
tpch
(3 rows)
presto:default> show tables from example.example;
Table
---------
numbers
(1 row)
presto:default> select * from example.example.numbers;
text | value
-------+-------
one | 1
two | 2
three | 3
ten | 10
eleven | 11
twelve | 12
(6 rows)
presto:default> show tables from example.information_schema;
Table
-------------------------
__internal_partitions__
columns
schemata
table_privileges
tables
views
(6 rows)
http-example connector 代码
//==column
public final class ExampleColumn
{
private final String name;
private final Type type;
public final class ExampleColumnHandle
implements ColumnHandle
{
private final String connectorId;
private final String columnName;
private final Type columnType;
private final int ordinalPosition;
//=======table
public class ExampleTable
{
private final String name;
private final List<ExampleColumn> columns;
private final List<ColumnMetadata> columnsMetadata;
private final List<URI> sources;
public final class ExampleTableHandle
implements ConnectorTableHandle
{
private final String connectorId;
private final String schemaName;
private final String tableName;
public class ExampleTableLayoutHandle
implements ConnectorTableLayoutHandle
{
private final ExampleTableHandle table;
//=======split
public class ExampleSplit
implements ConnectorSplit
{
private final String connectorId;
private final String schemaName;
private final String tableName;
private final URI uri;
private final boolean remotelyAccessible;
private final List<HostAddress> addresses;
public class ExampleSplitManager
implements ConnectorSplitManager
{
private final String connectorId;
private final ExampleClient exampleClient;
public ConnectorSplitSource getSplits(...){
}
//===Record
public class ExampleRecordSetProvider
implements ConnectorRecordSetProvider{
public RecordSet getRecordSet(..) {
根据split和列构造一个
return new ExampleRecordSet(exampleSplit, handles.build());
}
}
public class ExampleRecordSet
implements RecordSet
{
private final List<ExampleColumnHandle> columnHandles; //所有的列
private final List<Type> columnTypes;//所有列的类型
private final ByteSource byteSource;/usi
@Override
public List<Type> getColumnTypes()
{
return columnTypes;
}
@Override
public RecordCursor cursor()
{
return new ExampleRecordCursor(columnHandles, byteSource);
}
public class ExampleRecordCursor
implements RecordCursor
{
按照行列迭代返回数据
}
//=====Plugin
主要用于解析文件路径组织库表
public class ExampleClient
{
/**
* SchemaName -> (TableName -> TableMetadata)
*/
private final Supplier<Map<String, Map<String, ExampleTable>>> schemas;
}
//ExampleClient的外观
public class ExampleMetadata
implements ConnectorMetadata
{
private final String connectorId;
private final ExampleClient exampleClient;
}
返回一个ExampleConnectorFactory
public class ExamplePlugin
implements Plugin
{
@Override
public Iterable<ConnectorFactory> getConnectorFactories()
{
return ImmutableList.of(new ExampleConnectorFactory());
}
}
//
public class ExampleConnector
implements Connector
{
private static final Logger log = Logger.get(ExampleConnector.class);
private final LifeCycleManager lifeCycleManager;
private final ExampleMetadata metadata;
private final ExampleSplitManager splitManager;
private final ExampleRecordSetProvider recordSetProvider;
}
其他
Presto’s connector architecture creates an abstraction layer for anything that can be represented in a columnar or row-like format, such as HDFS, Amazon S3, Azure Storage, NoSQL stores, relational databases, Kafka streams and even proprietary data stores
Today Presto is not capable of pushing down aggregations and joins into MySQL. In many cases a simple workaround for this limitation is the creation of views inside MySQL that will be referenced by Presto queries. Such views should contain aggregations and/or joins of MySQL tables. The views are processed inside MySQL (along with any column/filters pushed down by Presto) and the resulting intermediate data is streamed back to Presto for the final processing.
views are expanded inline during query analysis. You are correct in thinking that the view will be executed as if you had written a subquery (that is exactly how it works).