Making up Data with Partition Outer Join

稀疏数据处理
本文介绍如何使用Oracle SQL中的分区外部连接解决稀疏数据问题,确保每个经理的状态都包含所有可能的类别,即使某些状态在原始数据中缺失。

refer:

http://nuijten.blogspot.com/2009/07/making-up-data-with-partition-outer.html

 

Just the other day on the Oracle SQL and PL/SQL forum, someone asked on how to create non-existent rows. This post is not about generating a number of rows.
It's about handling Sparse Data, when you want to fill in some missing records in your result set.
First time I heard about this was in a blog written by Lucas Jellema.
Let's first start out with the table and some test data:

01. SQL> create table test_table1
02.   2  (status varchar2(15)
03.   3  ,manager number
04.   4  ,sales number);
05.  
06. Table created.
07.  
08. SQL> insert into test_table1 values ('in process', 14, 100);
09.  
10. 1 row created.
11.  
12. SQL> insert into test_table1 values ('in process', 15, 10);
13.  
14. 1 row created.
15.  
16. SQL> insert into test_table1 values ('in process', 15, 40);
17.  
18. 1 row created.
19.  
20. SQL> insert into test_table1 values ('done', 14, 200);
21.  
22. 1 row created.
23.  
24. SQL> insert into test_table1 values ('done', 16, 50);
25.  
26. 1 row created.
27. SQL> select *
28.   2    from test_table1
29.   3  /
30.  
31. STATUS             MANAGER      SALES
32. --------------- ---------- ----------
33. in process              14        100
34. in process              15         10
35. in process              15         40
36. done                    14        200
37. done                    16         50


As you can see in the sample data, Manager 14 has entries for the status "in process" and "done". Manager 15 only has entries for "in process". Manager 16 only has a single entry for "done".
The result that we are after is to show for each Manager a total sales value for both statuses "in process" and "done".
When we use a regular SUM and GROUP BY:

01. SQL> select manager
02.   2       , status
03.   3       , sum(sales)
04.   4    from test_table1
05.   5   group by manager
06.   6          , status
07.   7   order by manager
08.   8          , status
09.   9  /
10.  
11.    MANAGER STATUS          SUM(SALES)
12. ---------- --------------- ----------
13.         14 done                   200
14.         14 in process             100
15.         15 in process              50
16.         16 done                    50


we only see values for records that are actually in the table... Go Figure!

Nice results, but not exactly what we are after. We want an extra record for Manager 15 (with status "done" and sales value of 0) and an extra record for Manager 16 (with status "in process" and also a value of 0).

One way to tackle this problem (or challenge if you prefer) is to use a Partition Outer Join. As far as i know this is not ANSI-SQL, but Oracle specific syntax. Tahiti.Oracle.com calls it an "extension to the ANSI syntax".
To make this query work, we need a "table" (or inline view) which has all possible statuses. Something like

01. SQL> select 'done' st from dual
02.   union all
03.   select 'in process' from dual
04.   4  /
05.  
06. ST
07. ----------
08. done
09. in process


This inline view will be outer joined to our table.
What makes a Partition Outer Join work differently from a regular Outer Join?
A regular Outer Join will show an extra single record even when a matching value is not present. In our case, this will not make a difference as the values "done" and "in process" are present in our base table.
What we want is to outer join all statuses from the inline view to our base table for each manager.
And this is exactly what the Partition Clause does. It breaks up the result set per manager. Per partition (one for Manager 14, one for Manager 15 and one for Manager 16) we want to outer join to the inline view.

Putting it all together, and here is the final result:

01. SQL> select manager
02.   2       , st
03.   3       , nvl (sum (sales) , 0)
04.   4    from test_table1 t partition by (manager)
05.   5   right outer
06.   6    join (select 'done' st from dual
07.   7          union all
08.   8          select 'in process' from dual
09.   9         ) sts
10.  10      on (t.status = sts.st)
11.  11   group by manager
12.  12          , st
13.  13   order by manager
14.  14          , st
15.  15  /
16.  
17.    MANAGER ST         NVL(SUM(SALES),0)
18. ---------- ---------- -----------------
19.         14 done                     600
20.         14 in process               300
21.         15 done                       0
22.         15 in process               150
23.         16 done                     150
24.         16 in process                 0
25.  
26. 6 rows selected.


Each Manager shows an entry for both statuses "done" and "in process", even when this value is not in the base table.

If -for whatever reason- you don't like RIGHT OUTER, just flip the tables around and call it a LEFT OUTER:

01. SQL> select manager
02.   2       , st
03.   3       , nvl (sum (sales) , 0)
04.   4    from (select 'done' st from dual
05.   5          union all
06.   6          select 'in process' from dual
07.   7         ) sts
08.   8    left outer
09.   9    join test_table1 t partition by (manager)
10.  10      on (t.status = sts.st)
11.  11   group by manager
12.  12          , st
13.  13   order by manager
14.  14          , st
15.  15  /
16.  
17.    MANAGER ST         NVL(SUM(SALES),0)
18. ---------- ---------- -----------------
19.         14 done                     800
20.         14 in process               400
21.         15 done                       0
22.         15 in process               200
23.         16 done                     200
24.         16 in process                 0
25.  
26. 6 rows selected.

 

内容概要:本文围绕EKF SLAM(扩展卡尔曼滤波同步定位与地图构建)的性能展开多项对比实验研究,重点分析在稀疏与稠密landmark环境下、预测与更新步骤同时进行与非同时进行的情况下的系统性能差异,并进一步探讨EKF SLAM在有色噪声干扰下的鲁棒性表现。实验考虑了不确定性因素的影响,旨在评估不同条件下算法的定位精度与地图构建质量,为实际应用中EKF SLAM的优化提供依据。文档还提及多智能体系统在遭受DoS攻击下的弹性控制研究,但核心内容聚焦于SLAM算法的性能测试与分析。; 适合人群:具备一定机器人学、状态估计或自动驾驶基础知识的科研人员及工程技术人员,尤其是从事SLAM算法研究或应用开发的硕士、博士研究生和相关领域研发人员。; 使用场景及目标:①用于比较EKF SLAM在不同landmark密度下的性能表现;②分析预测与更新机制同步与否对滤波器稳定性与精度的影响;③评估系统在有色噪声等非理想观测条件下的适应能力,提升实际部署中的可靠性。; 阅读建议:建议结合MATLAB仿真代码进行实验复现,重点关注状态协方差传播、观测更新频率与噪声模型设置等关键环节,深入理解EKF SLAM在复杂环境下的行为特性。稀疏 landmark 与稠密 landmark 下 EKF SLAM 性能对比实验,预测更新同时进行与非同时进行对比 EKF SLAM 性能对比实验,EKF SLAM 在有色噪声下性能实验
内容概要:本文围绕“基于主从博弈的售电商多元零售套餐设计与多级市场购电策略”展开,结合Matlab代码实现,提出了一种适用于电力市场化环境下的售电商优化决策模型。该模型采用主从博弈(Stackelberg Game)理论构建售电商与用户之间的互动关系,售电商作为领导者制定电价套餐策略,用户作为跟随者响应电价并调整用电行为。同时,模型综合考虑售电商在多级电力市场(如日前市场、实时市场)中的【顶级EI复现】基于主从博弈的售电商多元零售套餐设计与多级市场购电策略(Matlab代码实现)购电组合优化,兼顾成本最小化与收益最大化,并引入不确定性因素(如负荷波动、可再生能源出力变化)进行鲁棒或随机优化处理。文中提供了完整的Matlab仿真代码,涵盖博弈建模、优化求解(可能结合YALMIP+CPLEX/Gurobi等工具)、结果可视化等环节,具有较强的可复现性和工程应用价值。; 适合人群:具备一定电力系统基础知识、博弈论初步认知和Matlab编程能力的研究生、科研人员及电力市场从业人员,尤其适合从事电力市场运营、需求响应、售电策略研究的相关人员。; 使用场景及目标:① 掌握主从博弈在电力市场中的建模方法;② 学习售电商如何设计差异化零售套餐以引导用户用电行为;③ 实现多级市场购电成本与风险的协同优化;④ 借助Matlab代码快速复现顶级EI期刊论文成果,支撑科研项目或实际系统开发。; 阅读建议:建议读者结合提供的网盘资源下载完整代码与案例数据,按照文档目录顺序逐步学习,重点关注博弈模型的数学表达与Matlab实现逻辑,同时尝试对目标函数或约束条件进行扩展改进,以深化理解并提升科研创新能力。
内容概要:本文介绍了基于粒子群优化算法(PSO)的p-Hub选址优化问基于粒子群优化算法的p-Hub选址优化(Matlab代码实现)题的Matlab代码实现,旨在解决物流与交通网络中枢纽节点的最优选址问题。通过构建数学模型,结合粒子群算法的全局寻优能力,优化枢纽位置及分配策略,提升网络传输效率并降低运营成本。文中详细阐述了算法的设计思路、实现步骤以及关键参数设置,并提供了完整的Matlab仿真代码,便于读者复现和进一步改进。该方法适用于复杂的组合优化问题,尤其在大规模网络选址中展现出良好的收敛性和实用性。; 适合人群:具备一定Matlab编程基础,从事物流优化、智能算法研究或交通运输系统设计的研究生、科研人员及工程技术人员;熟悉优化算法基本原理并对实际应用场景感兴趣的从业者。; 使用场景及目标:①应用于物流中心、航空枢纽、快递分拣中心等p-Hub选址问题;②帮助理解粒子群算法在离散优化问题中的编码与迭代机制;③为复杂网络优化提供可扩展的算法框架,支持进一步融合约束条件或改进算法性能。; 阅读建议:建议读者结合文中提供的Matlab代码逐段调试运行,理解算法流程与模型构建逻辑,重点关注粒子编码方式、适应度函数设计及约束处理策略。可尝试替换数据集或引入其他智能算法进行对比实验,以深化对优化效果和算法差异的理解。
### JOIN - **含义**:JOIN用于将两个或多个表中的行根据指定的条件进行连接,以获取更全面的数据信息。 - **用法**:常见的JOIN类型有内连接(INNER JOIN)、左外连接(LEFT JOIN)、右外连接(RIGHT JOIN)和全外连接(FULL JOIN)等。 - **内连接(INNER JOIN)**:仅返回两张表中匹配的行。例如,查询有订单的用户信息: ```sql SELECT * FROM users INNER JOIN orders ON users.user_id = orders.user_id; ``` - **左外连接(LEFT JOIN)**:返回左表所有行,右表匹配不到的行用NULL填充。例如,查询所有用户及其订单,包括未下单用户: ```sql SELECT * FROM users LEFT JOIN orders ON users.user_id = orders.user_id; ``` - **右外连接(RIGHT JOIN)**:与LEFT JOIN相反,返回右表所有行,实际中较少使用,可用LEFT JOIN + 表交换替代。例如: ```sql SELECT * FROM users RIGHT JOIN orders ON users.user_id = orders.user_id; ``` - **全外连接(FULL JOIN)**:返回左右表所有行,匹配不到的用NULL填充(MySQL不支持,需用LEFT JOIN UNION RIGHT JOIN模拟)。例如: ```sql SELECT * FROM users FULL JOIN orders ON users.user_id = orders.user_id; -- MySQL模拟 SELECT * FROM users LEFT JOIN orders ON users.user_id = orders.user_id UNION SELECT * FROM users RIGHT JOIN orders ON users.user_id = orders.user_id; ``` ### PARTITION - **含义**:Partition即分区,是数据库中一种将表数据分散存储在不同物理位置的技术,主要用于提高查询性能和管理数据。 - **用法**:以MySQL为例,创建分区表时可以按范围(RANGE)、列表(LIST)、哈希(HASH)、键(KEY)等方式进行分区。 - **范围分区(RANGE)**:根据列值的范围进行分区。例如,按日期范围对订单表进行分区: ```sql CREATE TABLE orders ( order_id INT, order_date DATE ) PARTITION BY RANGE (YEAR(order_date)) ( PARTITION p2022 VALUES LESS THAN (2023), PARTITION p2023 VALUES LESS THAN (2024), PARTITION p2024 VALUES LESS THAN MAXVALUE ); ``` - **列表分区(LIST)**:根据列值的列表进行分区。例如,按地区对用户表进行分区: ```sql CREATE TABLE users ( user_id INT, region VARCHAR(50) ) PARTITION BY LIST COLUMNS (region) ( PARTITION p_north VALUES IN ('North', 'Northeast'), PARTITION p_south VALUES IN ('South', 'Southeast') ); ``` ### UNION - **含义**:UNION用于合并两个或多个SELECT语句的结果集,要求每个SELECT语句的列数和数据类型必须相同。 - **用法**:有UNION和UNION ALL两种。 - **UNION**:会去除结果集中的重复行。例如: ```sql SELECT column1 FROM table1 UNION SELECT column1 FROM table2; ``` - **UNION ALL**:不会去除重复行,只是简单地将结果集合并,性能上比UNION稍快。例如: ```sql SELECT column1 FROM table1 UNION ALL SELECT column1 FROM table2; ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值