SQL 基本组成
Projection, Data Source, Filter
SQL 解析
SQL语句首先被Parser模块解析成Unresolved Logical Plan;
Unresolved Logical Plan通过Analyzer模块借助于Catalog中的表信息解析为Logical Plan;
Optimizer再通过各种基于规则的优化策略进行深入优化,得到Optimized Logical Plan;
优化后的逻辑执行计划依然是逻辑的,并不能被Spark系统理解,此时需要将此逻辑执行计划转换为Physical Plan。
源码探究
SessionCatalog
An internal catalog that is used by a Spark Session. This internal catalog serves as a proxy to the underlying metastore (e.g. Hive Metastore) and it also manages temporary views and functions of the Spark Session that it belongs to.
This class must be thread-safe.
UnresolvedRelation