R&SQL合并记录并去重

R语言数据聚合

====================================================================================

data:

have <- data.frame(ID=paste0("id", c(101, 102, 102, 103, 103, 104, 105, 105)),
info1=c("one", "twoA", "twoB", "threeA", "threeB", "four", "five", "five"),
stringsAsFactors=FALSE)
want <- data.frame(ID=paste0("id", c(101:105)),
info1=c("one", "twoA; twoB", "threeA; threeB", "four", "five"),
stringsAsFactors=FALSE)

====================================================================================

 MySQL

SELECT
    user_id,
    COUNT(id) AS freq,
    GROUP_CONCAT(DISTINCT ip SEPARATOR ",") AS ips
FROM
    log_table
GROUP BY
    user_id
ORDER BY
    freq ASC;

====================================================================================

 R&SQL

library(sqldf)


sqldf("
SELECT ID,
GROUP_CONCAT(distinct info1) as info1
FROM have
GROUP BY ID")

注:此种情况没能实现更改间隔符,默认为逗号。

====================================================================================

 R

--------------------------------------------------------------------------------------------------------------------------------------

require(dplyr)

METHOD1:  have %>%    group_by(ID) %>%    summarise_each(funs(toString(sort(unique(info1)))))

METHOD2:  have %>%    group_by(ID) %>%    summarise(name = toString(sort(unique(info1))))

--------------------------------------------------------------------------------------------------------------------------------------

require(data.table)

METHOD1:  setDT(have)[, .(info1 = toString(sort(unique(info1)))), by = ID]
METHOD2:  setDT(have)[ , .(info1 = paste(unique(info1), collapse = ",")),by = ID]

注:paste函数作为其他函数的参数时,不能。例如

aggregate(have[,2], by=list(have$ID), paste(unique(info1)), collapse=";")   × 

aggregate(have[,2], by=list(have$ID), paste, collapse=";")  

setDT(have)[, lapply(.SD, paste(unique(info1)), collapse = "; "), by = ID]  ×

setDT(have)[, lapply(.SD, paste, collapse = "; "), by = ID]  √

--------------------------------------------------------------------------------------------------------------------------------------

aggregate(data=have,info1~ID,FUN = function(t) sort(unique(t)))

 

转载于:https://www.cnblogs.com/Emily07/p/6509362.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值