Pivoting Data in OWB

本文介绍如何使用 OWB 的数据透视操作符将包含年度季度销售额的单行记录转换为每季度一行的数据集,同时提供了从宽表到长表的具体实例,并展示了如何在 OWB 映射中设计这一转换流程。

The pivot transformation operator enables you to transform a single row of attributes into multiple rows in an efficient manner. This example illustrates transforming a table that has a row for each year with the quarterly sales in a table with a row for each quarter. The OWB pivot operator makes this simple (there is also an unpivot).

So taking a simple example as follows:

YEAR   Q1_sales      Q2_sales Q3_sales   Q4_sales
---------- ---------- ---------- ---------- ----------
      2005      10000      15000      14000      25000
      2006      12000      16000      15000      35000
      2007      16000      19000      15000      34000

we wish to transform the data set to the following with a row for each quarter:

   YEAR QTR    SALES
---------- -- ----------
      2005 Q1      10000
      2006 Q1      12000
      2007 Q1      16000
      2005 Q2      15000
      2006 Q2      16000
      2007 Q2      19000
      2005 Q3      14000
etc..

We can design this in the OWB mapping as;
Pivot0:

Looking at the internals of the operator we see how this is described. The pivot operator allows you to define the input columns, the output columns and how the data is pivoted. This is achieved by defining a few pieces of information;
  • the key columns (the columns from the source that will appear in the output of the pivoted data)
  • the row locator (this is the pivot column)
  • the pivot transformation (which values to project for the pivoted columns
Firstly in our example we define the key column to be YEAR, this will be the same for each pivoted row;
Pivot1:

Then we define the QUARTER column as the pivot column, this is the row locator (in OWB terms).
Pivot2:

Finally we define how the row is transformed from a row with columns to a number of rows, we do this by entering a row in the table for each case we desire (so we have a row for Q1, a row for Q2, a row ... etc.).
Pivot3:

This makes the map design so much simpler as you can see, since the operator encapsulates the transformation in a simple manner. The example has been scripted ( get the script here) so that you can create it and have a look around at how this is done....

Here is the sample data I used also:
--
-- Execute the following where you deployed SALES_BY_QTR
-- I manually added some rows in SALES_BY_QTR for the example:
--
insert into SALES_BY_QTR values (2005, 10000, 15000, 14000, 25000);
insert into SALES_BY_QTR values (2006, 12000, 16000, 15000, 35000);
insert into SALES_BY_QTR values (2007, 16000, 19000, 15000, 34000);
commit;

The reverse of this scenario is the unpivot, the script for the unpivot example can be found here.

You select the key just like the pivot above, then define the row locator (or unpivot column), defining the values for each match row:
Unpivot0:

Then define the output attributes for the unpivot:
Unpivot1:

Finally define the unpivot transformations (how the column data is taken from the matching row):
Unpivot2:

If your data has many rows with sales values for a quarter (for a single year) you will need to aggregate the data before unpivoting, for example the map below first aggregates and sums sales before unpivoting. The data is grouped by YEAR (key) and QUARTER (row locator) and
the output expression has SUM(SALES), the map then unpivots that data. (you cannot tweak the agg function
just now in the unpivot)
Unpivot3:

Hope this is useful and helps illustrates the pivoting transformation capability.

### 数据集的 Pivot 操作 在数据分析领域,`pivot` 是一种常见的操作方法,用于将数据从长格式转换为宽格式。以下是关于如何通过不同工具实现 `pivot` 的详细介绍。 #### Excel 中的 PowerPivot 使用 在 Microsoft Excel 中,PowerPivot 提供了一种强大的方式来集成多个数据源并生成数据透视表或图表。它允许用户轻松地汇总、分析和展示摘要数据[^1]。具体来说,可以通过以下功能完成数据集的转置: - **数据透视表**:利用 PowerPivot 插入的数据模型,可以快速构建数据透视表,从而调整行列布局以满足特定需求。 此外,Power Pivot 支持 DAX(Data Analysis Expressions)语言,这使得复杂计算成为可能,并能与其他微软生态系统中的工具无缝协作,比如 Power BI[^3]。 #### Pandas 库中的 Pivot 方法 对于 Python 用户而言,在 Pandas 库中提供了灵活的方法来进行类似的 pivot 操作。例如,如果目标是从原始表格结构转变为指定的行索引、列标签以及数值字段,则可采用如下代码片段: ```python import pandas as pd df_pivot = df.pivot(index='date', columns='product', values='sales') print(df_pivot) ``` 上述脚本展示了怎样依据日期(`date`)设置新表的行索引,产品名称(`product`)定义各列头项,而销售金额(`sales`)填充相应单元格的内容[^2]。 这种技术非常适合那些需要频繁切换视图或者重新整理大量记录的研究人员和技术专家们。 ### 总结 无论是借助于桌面应用程序如Microsoft Office套件内的组件还是编程环境下的第三方库函数, 都能够有效地执行数据重塑任务即所谓的"pivoting". 这些手段不仅简化了繁杂的手动过程而且提高了工作效率.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值