Making up Data with Partition Outer Join

稀疏数据处理

最新推荐文章于 2023-07-31 10:10:02 发布

转载最新推荐文章于 2023-07-31 10:10:02 发布 · 396 阅读

文章标签：

#join #up #manager #table #insert #extension

Oracle 专栏收录该内容

56 篇文章

订阅专栏

本文介绍如何使用Oracle SQL中的分区外部连接解决稀疏数据问题，确保每个经理的状态都包含所有可能的类别，即使某些状态在原始数据中缺失。

refer:

http://nuijten.blogspot.com/2009/07/making-up-data-with-partition-outer.html

Just the other day on the Oracle SQL and PL/SQL forum, someone asked on how to create non-existent rows. This post is not about generating a number of rows.
It's about handling Sparse Data, when you want to fill in some missing records in your result set.
First time I heard about this was in a blog written by Lucas Jellema.
Let's first start out with the table and some test data:

 
   01. 
   SQL> create table test_table1  
  
   02. 
     2  (status varchar2(15) 
  
   03. 
     3  ,manager number 
  
   04. 
     4  ,sales number); 
  
   05. 
     
   06. 
   Table created. 
  
   07. 
     
   08. 
   SQL> insert into test_table1 values ('in process', 14, 100); 
  
   09. 
     
   10. 
   1 row created. 
  
   11. 
     
   12. 
   SQL> insert into test_table1 values ('in process', 15, 10); 
  
   13. 
     
   14. 
   1 row created. 
  
   15. 
     
   16. 
   SQL> insert into test_table1 values ('in process', 15, 40); 
  
   17. 
     
   18. 
   1 row created. 
  
   19. 
     
   20. 
   SQL> insert into test_table1 values ('done', 14, 200); 
  
   21. 
     
   22. 
   1 row created. 
  
   23. 
     
   24. 
   SQL> insert into test_table1 values ('done', 16, 50); 
  
   25. 
     
   26. 
   1 row created. 
  
   27. 
   SQL> select * 
  
   28. 
     2    from test_table1 
  
   29. 
     3  / 
  
   30. 
     
   31. 
   STATUS             MANAGER      SALES 
  
   32. 
   --------------- ---------- ---------- 
  
   33. 
   in process              14        100 
  
   34. 
   in process              15         10 
  
   35. 
   in process              15         40 
  
   36. 
   done                    14        200 
  
   37. 
   done                    16         50

As you can see in the sample data, Manager 14 has entries for the status "in process" and "done". Manager 15 only has entries for "in process". Manager 16 only has a single entry for "done".
The result that we are after is to show for each Manager a total sales value for both statuses "in process" and "done".
When we use a regular SUM and GROUP BY:

 
   01. 
   SQL> select manager 
  
   02. 
     2       , status 
  
   03. 
     3       , sum(sales) 
  
   04. 
     4    from test_table1 
  
   05. 
     5   group by manager 
  
   06. 
     6          , status 
  
   07. 
     7   order by manager 
  
   08. 
     8          , status 
  
   09. 
     9  / 
  
   10. 
     
   11. 
      MANAGER STATUS          SUM(SALES) 
  
   12. 
   ---------- --------------- ---------- 
  
   13. 
           14 done                   200 
  
   14. 
           14 in process             100 
  
   15. 
           15 in process              50 
  
   16. 
           16 done                    50

we only see values for records that are actually in the table... Go Figure!

Nice results, but not exactly what we are after. We want an extra record for Manager 15 (with status "done" and sales value of 0) and an extra record for Manager 16 (with status "in process" and also a value of 0).

One way to tackle this problem (or challenge if you prefer) is to use a Partition Outer Join. As far as i know this is not ANSI-SQL, but Oracle specific syntax. Tahiti.Oracle.com calls it an "extension to the ANSI syntax".
To make this query work, we need a "table" (or inline view) which has all possible statuses. Something like

 
   01. 
   SQL> select 'done' st from dual 
  
   02. 
     2  union all 
  
   03. 
     3  select 'in process' from dual 
  
   04. 
     4  / 
  
   05. 
     
   06. 
   ST 
  
   07. 
   ---------- 
  
   08. 
   done 
  
   09. 
   in process

This inline view will be outer joined to our table.
What makes a Partition Outer Join work differently from a regular Outer Join?
A regular Outer Join will show an extra single record even when a matching value is not present. In our case, this will not make a difference as the values "done" and "in process" are present in our base table.
What we want is to outer join all statuses from the inline view to our base table for each manager.
And this is exactly what the Partition Clause does. It breaks up the result set per manager. Per partition (one for Manager 14, one for Manager 15 and one for Manager 16) we want to outer join to the inline view.

Putting it all together, and here is the final result:

 
   01. 
   SQL> select manager 
  
   02. 
     2       , st 
  
   03. 
     3       , nvl (sum (sales) , 0) 
  
   04. 
     4    from test_table1 t partition by (manager) 
  
   05. 
     5   right outer 
  
   06. 
     6    join (select 'done' st from dual 
  
   07. 
     7          union all 
  
   08. 
     8          select 'in process' from dual 
  
   09. 
     9         ) sts 
  
   10. 
    10      on (t.status = sts.st) 
  
   11. 
    11   group by manager 
  
   12. 
    12          , st 
  
   13. 
    13   order by manager 
  
   14. 
    14          , st 
  
   15. 
    15  / 
  
   16. 
     
   17. 
      MANAGER ST         NVL(SUM(SALES),0) 
  
   18. 
   ---------- ---------- ----------------- 
  
   19. 
           14 done                     600 
  
   20. 
           14 in process               300 
  
   21. 
           15 done                       0 
  
   22. 
           15 in process               150 
  
   23. 
           16 done                     150 
  
   24. 
           16 in process                 0 
  
   25. 
     
   26. 
   6 rows selected.

Each Manager shows an entry for both statuses "done" and "in process", even when this value is not in the base table.

If -for whatever reason- you don't like RIGHT OUTER, just flip the tables around and call it a LEFT OUTER:

 
   01. 
   SQL> select manager 
  
   02. 
     2       , st 
  
   03. 
     3       , nvl (sum (sales) , 0) 
  
   04. 
     4    from (select 'done' st from dual 
  
   05. 
     5          union all 
  
   06. 
     6          select 'in process' from dual 
  
   07. 
     7         ) sts 
  
   08. 
     8    left outer 
  
   09. 
     9    join test_table1 t partition by (manager) 
  
   10. 
    10      on (t.status = sts.st) 
  
   11. 
    11   group by manager 
  
   12. 
    12          , st 
  
   13. 
    13   order by manager 
  
   14. 
    14          , st 
  
   15. 
    15  / 
  
   16. 
     
   17. 
      MANAGER ST         NVL(SUM(SALES),0) 
  
   18. 
   ---------- ---------- ----------------- 
  
   19. 
           14 done                     800 
  
   20. 
           14 in process               400 
  
   21. 
           15 done                       0 
  
   22. 
           15 in process               200 
  
   23. 
           16 done                     200 
  
   24. 
           16 in process                 0 
  
   25. 
     
   26. 
   6 rows selected.