tidyverse学习笔记——Data tidying篇

Data tidying

Rules that make a dataset clean

  1. Each variable is a column; each column is a variable.
  2. Each observation is a row; each row is an observation.
  3. Each value is a cell; each cell is a single value.

pivot

pivot helps to tidy the datasets.

Lengthening data

pivot_longer

pivot_longer() “lengthens” data, increasing the number of rows and decreasing the number of columns.

Data in column names
artist date.entered wk1 wk2 wk3 wk4 wk5
2 Pac 2000-02-26 87 82 72 77 87
2Ge+her 2000-09-02 91 87 92 NA NA
3 Doors Down 2000-04-08 81 70 68 67 66
3 Doors Down 2000-10-21 76 76 72 69 67
504 Boyz 2000-04-15 57 34 25 17 17
98^0 2000-08-19 51 39 34 26 26
billboard |> 
    pivot_longer(
        cols = starts_with("wk"), 
        names_to = "week", 
        values_to = "rank",
        values_drop_na = TRUE
    )
Variables in column names
country year sp_m_014 sp_m_1524 sp_f_2534 sp_m_3544 sp_m_4554
Afghanistan 1980 NA NA NA NA NA
Afghanistan 1981 NA NA NA NA NA
Afghanistan 1982 NA NA NA NA NA
Afghanistan 1983 NA NA NA NA NA
Afghanistan 1984 NA NA NA NA NA
Afghanistan 1985 NA NA NA NA NA

sp” refers to diagnosis, “m” and “f” denote gender. “014” indicates age between 0 and 14.

who2 |> 
    pivot_longer(
        cols = !(country:year),
        names_to = c("diagnosis", "gender", "age"), 
        names_sep = "_",
        values_to = "count"
    )

Widening data

cms_patient_experience |> 
    pivot_wider(
        names_from = measure_cd,
        values_from =
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值