Data tidying
Rules that make a dataset clean
- Each variable is a column; each column is a variable.
- Each observation is a row; each row is an observation.
- Each value is a cell; each cell is a single value.
pivot
pivot helps to tidy the datasets.
Lengthening data
pivot_longer
pivot_longer() “lengthens” data, increasing the number of rows and decreasing the number of columns.
Data in column names
| artist | date.entered | wk1 | wk2 | wk3 | wk4 | wk5 |
|---|---|---|---|---|---|---|
| 2 Pac | 2000-02-26 | 87 | 82 | 72 | 77 | 87 |
| 2Ge+her | 2000-09-02 | 91 | 87 | 92 | NA | NA |
| 3 Doors Down | 2000-04-08 | 81 | 70 | 68 | 67 | 66 |
| 3 Doors Down | 2000-10-21 | 76 | 76 | 72 | 69 | 67 |
| 504 Boyz | 2000-04-15 | 57 | 34 | 25 | 17 | 17 |
| 98^0 | 2000-08-19 | 51 | 39 | 34 | 26 | 26 |
billboard |>
pivot_longer(
cols = starts_with("wk"),
names_to = "week",
values_to = "rank",
values_drop_na = TRUE
)
Variables in column names
| country | year | sp_m_014 | sp_m_1524 | sp_f_2534 | sp_m_3544 | sp_m_4554 |
|---|---|---|---|---|---|---|
| Afghanistan | 1980 | NA | NA | NA | NA | NA |
| Afghanistan | 1981 | NA | NA | NA | NA | NA |
| Afghanistan | 1982 | NA | NA | NA | NA | NA |
| Afghanistan | 1983 | NA | NA | NA | NA | NA |
| Afghanistan | 1984 | NA | NA | NA | NA | NA |
| Afghanistan | 1985 | NA | NA | NA | NA | NA |
“sp” refers to diagnosis, “m” and “f” denote gender. “014” indicates age between 0 and 14.
who2 |>
pivot_longer(
cols = !(country:year),
names_to = c("diagnosis", "gender", "age"),
names_sep = "_",
values_to = "count"
)
Widening data
cms_patient_experience |>
pivot_wider(
names_from = measure_cd,
values_from =

最低0.47元/天 解锁文章
3934

被折叠的 条评论
为什么被折叠?



