Informatica中实现count(distinct)

最新推荐文章于 2021-01-14 15:18:09 发布

转载最新推荐文章于 2021-01-14 15:18:09 发布 · 2.1k 阅读

Informatica 专栏收录该内容

15 篇文章

订阅专栏

本文介绍了一种使用双聚合器的方法来处理数据集，通过这种方式可以有效地计算每个名称下的不同ID数量及其总金额。首先，利用第一个聚合器按名称和ID进行分组并求和；接着，在第二个聚合器中按名称分组，计算ID的不重复计数和总金额。此外，还提供了一种标记记录的方法，以便在数据已排序的情况下更高效地实现这一目标。

Thanks for your response. Here is the sample data information:

A id1 $200
A id1 $300
A id2 $150
B id3 $100
B id4 $20

I want the following in the output:
Name distinct-Count Totalamt
A 2 $650

B 2 $120

One way to do this is to use 2 aggregators. The First aggregator group and sum amt by Name and ID.
On the second aggregator group by Name and count id and sum amt.

So if I remove the duplicate row in the Sorter I will get incorrect distinct-count and totalamt for "A".

There are two different ways.

1) Suggested by Manas - to use two aggregates.
2)Flaging the record as 1 and 0 before aggregate. For this, you should have sorted data on name and id.
If you have sorted data coming from source, try 2nd option as given below.
In expression transformation before aggregator.

IN_ID
v_ID_CNT = IIF(ISNULL(v_ID) Or IN_ID!= V_ID,1, 0)
O_ID_CNT=V_ID_CNT
V_ID=IN_ID

O_ID_CNT will go in to your aggregator.

You need to create one more port in aggregator for SUM(O_ID_CNT) which will return the distinct count of IDs.
(No change in GROUP BY columns)