R-tidyverse

原创已于 2022-08-30 10:38:48 修改 · 298 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#r语言

于 2022-08-05 16:53:11 首次发布

R-tidyverse数据处理专栏收录该内容

1 篇文章

订阅专栏

本文详细介绍了R语言tidyverse包中的数据操作，包括使用head、tail、slice函数进行数据提取，按分组选取特定行，以及利用slice_head、distinct等函数进行数据筛选。此外，还讲解了separate和unite函数在数据列拆分与合并中的应用，以及如何高效地处理数据框中的重复行。这些实用技巧对于数据预处理和探索至关重要。

R-tidyverse

参考资料：https://bookdown.org/wangminjie/R4DS/tidyverse-readr.html

13.4 用法

13.4.1 索取

head、tail 函数分别显示前6个或者后6个

slice 取出固定位置的行

# 按行位置索引，加上符号这是反选
penguins %>% slice(2:5)

# ？？
penguins %>% 
  group_by(species) %>% 
  slice(1)

# 每组只取2行
penguins %>% 
  group_by(species) %>% 
  slice_head( n=2)

# pro=0.5 表示按分组只取每组内一半的数据
penguins %>% 
  group_by(species) %>% 
  slice_head( prop = 0.5 )

slice 按位置索引

## bill_length_mm中最大值所在的行
### 法1
penguins %>% 
  filter(bill_length_mm == max(bill_length_mm))

### 法2
penguins %>% 
  arrange(desc(bill_length_mm)) %>% 
  slice(1)

### 法3
penguins %>% 
  slice_max(bill_length_mm)

### 抽样，replace = TRUE 表示有重复抽烟
iris %>% as_tibble() %>% slice_sample(n = 5, replace = TRUE)

separate 分割；unite联合

tb <- tibble::tribble(
  ~day, ~price,
  1,   "30-45",
  2,   "40-95",
  3,   "89-65",
  4,   "45-63",
  5,   "52-42"
)

tb1 <- tb %>% 
  separate(price, into = c("low", "high"), sep = "-")
tb1

tb1 %>% 
  unite(col = "price", c(low, high), sep = ":", remove = FALSE)

distinct 处理的对象是data.frame；功能是筛选不重复的row；n_distinct()处理的对象是vector，功能是统计不同的元素有多少个，返回一个数值

df <- tibble::tribble(
  ~x, ~y, ~z,
  1, 1, 1,
  1, 1, 2,
  1, 1, 1,
  2, 1, 2,
  2, 2, 3,
  3, 3, 1
)
df

df %>%
  distinct()

df %>%
  distinct(x)

df %>%
  distinct(x, y)

df %>%
  distinct(x, y, .keep_all = TRUE) # 只保留最先出现的row

df %>%
  distinct(
    across(c(x, y)),
    .keep_all = TRUE
  )

df %>%
  group_by(x) %>%
  distinct(y, .keep_all = TRUE)