pig Explain详解

本文详细解析了使用Pig脚本执行数据处理任务时,从加载数据到逻辑计划、物理计划再到MapReduce计划的过程,包括样本抽取、排序等操作的详细解释。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

有如下代码:

<pre name="code" class="java"> b = load '/in_off/tree/20140101/*' as (date,uid);
 c = sample  b 0.01;
 d = limit c 10 ;

分别explain下。


<pre name="code" class="php">explain b;

2014-06-10 10:09:50,697 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
b: (Name: LOStore Schema: date#3:bytearray,uid#4:bytearray)
|
|---b: (Name: LOLoad Schema: date#3:bytearray,uid#4:bytearray)RequiredFields:[0, 1]

#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-0

2014-06-10 10:09:50,859 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-10 10:09:50,930 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-06-10 10:09:50,930 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
#--------------------------------------------------
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node scope-2
Map Plan
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-0--------
Global sort: false
----------------

首先
#-----------------------------------------------
# New Logical Plan:
逻辑执行如何生成b?
b: (Name: LOStore Schema: date#3:bytearray,uid#4:bytearray)
|
|---b: (Name: LOLoad Schema: date#3:bytearray,uid#4:bytearray)RequiredFields:[0, 1]

然后

#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-0

最后MR如何?

2014-06-10 10:09:50,859 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-10 10:09:50,930 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-06-10 10:09:50,930 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
#--------------------------------------------------
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node scope-2
Map Plan
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-0--------
Global sort: false
----------------
看到每没有,load只有map,没有reduce.

再来继续。对b做抽样,去0.001%出来。

<pre name="code" class="java">grunt> c  = sample b 0.001;
2014-06-10 10:10:42,092 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s).
grunt> explain c;          
2014-06-10 10:10:46,421 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
看logic plan里,由底到高,看到最下面是b,然后经过操作,最后生成c.

#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
c: (Name: LOStore Schema: date#36:bytearray,uid#37:bytearray)
|
|---c: (Name: LOFilter Schema: date#36:bytearray,uid#37:bytearray)
    |   |
    |   (Name: LessThan Type: boolean Uid: 43)
    |   |
    |   |---(Name: UserFunc(org.apache.pig.builtin.RANDOM) Type: double Uid: 41)
    |   |
    |   |---(Name: Constant Type: double Uid: 42)
    |
    |---b: (Name: LOLoad Schema: date#36:bytearray,uid#37:bytearray)RequiredFields:[0, 1]

Physical plan里注意到RANDOM.
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
c: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-8
|
|---c: Filter[bag] - scope-4
    |   |
    |   Less Than[boolean] - scope-7
    |   |
    |   |---POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-5
    |   |
    |   |---Constant(0.0010) - scope-6
    |
    |---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-3

2014-06-10 10:10:46,477 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-10 10:10:46,481 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-06-10 10:10:46,481 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1

还是没有用到reduce.
#--------------------------------------------------
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node scope-9
Map Plan
c: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-8
|
|---c: Filter[bag] - scope-4
    |   |
    |   Less Than[boolean] - scope-7
    |   |
    |   |---POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-5
    |   |
    |   |---Constant(0.0010) - scope-6
    |
    |---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-3--------
Global sort: false
----------------

让我们用下reduce.order 一个。

<span style="font-family: Arial, Helvetica, sans-serif;">
</span>
<span style="font-family: Arial, Helvetica, sans-serif;"></span><pre name="code" class="java">runt> d = order c by uid;
2014-06-10 10:13:12,689 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s).
grunt> explain d;
2014-06-10 10:13:17,037 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
d: (Name: LOStore Schema: date#64:bytearray,uid#65:bytearray)
|
|---d: (Name: LOSort Schema: date#64:bytearray,uid#65:bytearray)
    |   |
    |   uid:(Name: Project Type: bytearray Uid: 65 Input: 0 Column: 1)
    |
    |---c: (Name: LOFilter Schema: date#64:bytearray,uid#65:bytearray)
        |   |
        |   (Name: LessThan Type: boolean Uid: 71)
        |   |
        |   |---(Name: UserFunc(org.apache.pig.builtin.RANDOM) Type: double Uid: 69)
        |   |
        |   |---(Name: Constant Type: double Uid: 70)
        |
        |---b: (Name: LOLoad Schema: date#64:bytearray,uid#65:bytearray)RequiredFields:[0, 1]

#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
d: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-17
|
|---d: POSort[bag]() - scope-16
    |   |
    |   Project[bytearray][1] - scope-15
    |
    |---c: Filter[bag] - scope-11
        |   |
        |   Less Than[boolean] - scope-14
        |   |
        |   |---POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-12
        |   |
        |   |---Constant(0.0010) - scope-13
        |
        |---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-10

2014-06-10 10:13:17,052 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-10 10:13:17,089 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2014-06-10 10:13:17,089 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
#--------------------------------------------------
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node scope-18
Map Plan
Store(hdfs://namenode:9000/tmp/temp-1458176514/tmp-990933258:org.apache.pig.impl.io.InterStorage) - scope-19
|
|---c: Filter[bag] - scope-11
    |   |
    |   Less Than[boolean] - scope-14
    |   |
    |   |---POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-12
    |   |
    |   |---Constant(0.0010) - scope-13
    |
    |---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-10--------
Global sort: false
----------------

MapReduce node scope-21
Map Plan
d: Local Rearrange[tuple]{tuple}(false) - scope-25
|   |
|   Constant(all) - scope-24
|
|---New For Each(false)[tuple] - scope-23
    |   |
    |   Project[bytearray][1] - scope-22
    |
    |---Load(hdfs://namenode:9000/tmp/temp-1458176514/tmp-990933258:org.apache.pig.impl.builtin.RandomSampleLoader('org.apache.pig.impl.io.InterStorage','100')) - scope-20--------
Reduce Plan
Store(hdfs://namenode:9000/tmp/temp-1458176514/tmp-67539995:org.apache.pig.impl.io.InterStorage) - scope-34
|
|---New For Each(false)[tuple] - scope-33
    |   |
    |   POUserFunc(org.apache.pig.impl.builtin.FindQuantiles)[tuple] - scope-32
    |   |
    |   |---Project[tuple][*] - scope-31
    |
    |---New For Each(false,false)[tuple] - scope-30
        |   |
        |   Constant(-1) - scope-29
        |   |
        |   Project[bag][1] - scope-27
        |
        |---Package[tuple]{chararray} - scope-26--------
Global sort: false
Secondary sort: true
----------------

MapReduce node scope-36
Map Plan
d: Local Rearrange[tuple]{bytearray}(false) - scope-37
|   |
|   Project[bytearray][1] - scope-15
|
|---Load(hdfs://namenode:9000/tmp/temp-1458176514/tmp-990933258:org.apache.pig.impl.io.InterStorage) - scope-35--------
Reduce Plan
d: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-17
|
|---New For Each(true)[tuple] - scope-40
    |   |
    |   Project[bag][1] - scope-39
    |
    |---PackageLite[tuple]{bytearray} - scope-38--------
Global sort: true

看到没,reduce用到了。









### Oracle 数据库中 EXPLAIN PLAN 命令详解 #### 一、EXPLAIN PLAN 的基本概念 `EXPLAIN PLAN` 是一种用于显示 SQL 查询执行计划的工具。通过该命令可以查看优化器如何处理查询,包括访问路径的选择以及表连接顺序等重要信息[^1]。 #### 二、创建并使用 PLANTABLE 为了能够存储和展示执行计划,在运行 `EXPLAIN PLAN`之前通常需要先建立一个名为 `PLAN_TABLE` 的表来保存这些数据。如果尚未创建此表格,则可以通过如下方式完成: ```sql @?/rdbms/admin/utlxplan.sql; ``` 这条语句会调用脚本来自动构建所需的结构化查询语言(SQL)表单[^2]。 #### 三、获取当前缓存中的执行计划 对于已经存在于共享池内的SQL语句, 可以直接利用DBMS_XPLAN包下的DISPLAY函数来提取其对应的执行方案: ```sql SELECT plan_table_output FROM TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE')); ``` 上述代码片段展示了从默认位置读取最近一次分析的结果,并将其格式化输出给用户查阅。 #### 四、刷新共享池注意事项 需要注意的是,在某些情况下可能希望清除内存里的旧有定义以便重新评估新的查询效率。此时可考虑采用以下指令: ```sql ALTER SYSTEM FLUSH SHARED_POOL; ``` 但是请注意这一步骤仅限于测试环境中操作,切勿轻易应用于实际业务场景以免造成不必要的影响。 #### 五、基于 AWR 报告获取历史执行计划 除了即时性的诊断外,管理员也可以借助自动化工作负载资料库(Automated Workload Repository,AWR) 来回顾过往一段时间内特定ID关联起来的历史记录: ```sql select * from table(dbms_xplan.display_awr('${SQL_ID}')); ``` 这里 `${SQL_ID}` 应替换为目标查询的实际编号,从而精确检索到相应的档案详情[^3]。 #### 六、理解常见访问方法 当涉及到具有唯一约束条件的数据列时——无论是作为主键还是设置了UNIQUE索引的情况之下——Oracle倾向于采取高效能的独特扫描策略来进行匹配查找作业[^4]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值