R的dplyr包常用的函数

本文介绍了dplyr包中的关键函数,如group_by、filter、select、mutate和各种join方法,展示了如何使用这些函数进行数据清洗、选择、计算和合并,以实现高效的数据处理和分析。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


# load the dplyr

```{r}
rm(list=ls())
if(!require(dplyr)) install.packages(dplyr)
library(dplyr)
vignette(package = "dplyr")
```

Several functions that supported in the dplyr packages,such as   
**group_by**   
**filter**    
**distinct**   
**arrange**   
**left_join**   
etc.   
from htlm   
* [github](https://github.com/rstudio/cheatsheets/blob/main/data-transformation)   

# select
**you can use select function to define the variables that if you wanna choose in the datasets**   
**you also can use the minus(-) to exclude variables**   
 
next we use diamonds that saved in the ggplot2 to display how select works   
```{r}
library(ggplot2)
data(diamonds,packages="ggplot2")
#show the first six lines 
head(diamonds)
#choose the variable from carat to price(use colon(:))
diamonds_se1 <- diamonds %>%
                select(carat:price)

#we don't choose clarity
diamonds_se2 <- diamonds %>%
                select(carat:price,-clarity)

#other helpers select variables by matching patterns in their names:
#starts_with:starts with a prefix
#ends_with:ends with a suffix
#contains:contains a literal string
#matches:matches a regular expression
#num_range:matches a numerical range like x01 x02 x03

diamonds_se3 <- diamonds %>%
                select(starts_with("d")|ends_with("y")|contains("i")|matches("c.+t")|num_range("x",1:3,suffix="t"))
```

# filter
**you can use filter to withdraw the observations met requirement**   

```{r}
diamonds_fl1 <- diamonds %>%
                filter(cut %in% c("Very Good","Good"),price>3100)

vars <- c("depth","price")
cond <- c(60,3000)
diamonds_fl2 <- diamonds %>%
              filter(y > mean(y,na.rm=TRUE),
                .data[[vars[[1]]]] > cond[[1]],
                .data[[vars[[2]]]] > cond[[2]]
              )
```

# mutate
**you can use mutate to calculate a new variable**   

```{r}
diamonds_mu1 <- diamonds %>%
               mutate(price1 = price*0.9,cprice=carat*price+price,
                      type=if_else(price>mean(price,na.rm=T),">mean","<=mean"))
```

# summarise function
**combine with group_by**    
*center:mean,median    
*spread:sd,iqr,mad    
*range:min,max,quntile   
*position:first,last,nth    
*count:n,n_distinct    
*logical:any,all    

```{r}
diamonds_groupby <-diamonds %>%
                   group_by(color) %>%
                   summarise(nrow=n(),mean_price=round(mean(price,na.rm=T),2),std_price=round(sd(price,na.rm=T),2))
```

# arrange
**order the dataset**    

```{r}
#desc means descending order
#arrange default ascending order
diamonds_arrange1 <- diamonds %>%
                     group_by(color) %>%
                     summarise(nrow=n(),mean_price=mean(price),std_price=sd(price)) %>%
                     arrange(nrow,desc(mean_price))
```

# join

Mutating Joind:   

*inner_join:keep the obs both in x and y    
*equal:merge(x,y)*    
*equal:select  from inner join*    

*left_join:keep all obs in x    
*equal:merge(x,y,all.x=T)*      
*equal:select  from left join*     

*right join:keep all obs in y     
*equal:merge(x,y,all.y=T)*      
*equal:select  from right join*     

*full_join:keep all obs in x and y     
*equal:merge(x,y,all.x=T,all.y=T)*    
*equal:select  from full join*    

Filtering Joins:     
*semi_join:keep the obs in x that matched in y*              
*anti_join:discard the obs in x that matched in y*           

```{r}
# use the saved dataset in dplyr to display four join function
inner <- band_members %>%
         inner_join(band_instruments,by="name")
left <- band_members %>%
         left_join(band_instruments,by="name")
right <- band_members %>%
         right_join(band_instruments,by="name")
full <- band_members %>%
         full_join(band_instruments,by="name")

semi <- band_members %>%
        semi_join(band_instruments,by="name")
anti <- band_members %>%
        anti_join(band_instruments,by="name")
```

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值