Informatica中实现count(distinct)

本文介绍了一种使用双聚合器的方法来处理数据集,通过这种方式可以有效地计算每个名称下的不同ID数量及其总金额。首先,利用第一个聚合器按名称和ID进行分组并求和;接着,在第二个聚合器中按名称分组,计算ID的不重复计数和总金额。此外,还提供了一种标记记录的方法,以便在数据已排序的情况下更高效地实现这一目标。
Thanks for your response. Here is the sample data information: 

A id1 $200 
A id1 $300 
A id2 $150 
B id3 $100 
B id4 $20 

I want the following in the output: 
Name distinct-Count Totalamt 
A 2 $650 

B 2 $120 


One way to do this is to use 2 aggregators. The First aggregator group and sum amt by Name and ID. 
On the second aggregator group by Name and count id and sum amt.

So if I remove the duplicate row in the Sorter I will get incorrect distinct-count and totalamt for "A".

There are two different ways. 

1) Suggested by Manas - to use two aggregates. 
2)Flaging the record as 1 and 0 before aggregate. For this, you should have sorted data on name and id. 
If you have sorted data coming from source, try 2nd option as given below. 
In expression transformation before aggregator. 

IN_ID 
v_ID_CNT = IIF(ISNULL(v_ID) Or IN_ID!= V_ID,1, 0) 
O_ID_CNT=V_ID_CNT 
V_ID=IN_ID 

O_ID_CNT will go in to your aggregator. 

You need to create one more port in aggregator for SUM(O_ID_CNT) which will return the distinct count of IDs. 
(No change in GROUP BY columns)

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值