一、利用比率思考问题
比如回答,女性好友数量是男性好友数量的多少倍
二、将长格式数据转换为宽格式数据
使用 R 进行的数据整理
宽格式和长格式之间的转换
融合数据框
> pf.fc_by_age_gender1 <- pf%>%
+ filter(!is.na(gender))%>%
+ group_by(age,gender)%>%
+ summarise(mean_friend_count=mean(friend_count),
+ median_friend_count = median(friend_count),
+ n=n())%>%
+ arrange(age)
> head(pf.fc_by_age_gender)
# A tibble: 6 x 5
age gender mean_friend_count median_friend_count n
<int> <fct> <dbl> <dbl> <int>
1 13 female 259. 148. 193
2 13 male 102. 55.0 291
3 14 female 362. 224. 847
4 14 male 164. 92.5 1078
5 15 female 539. 276. 1139
6 15 male 201. 106. 1478
- 使用tidyr进行整理
install.packages("tidyr")
library("tidyr")
> spread(subset(pf.fc_by_age_gender, select = c('gender', 'age', 'median_friend_count')), gender, median_friend_count)
# A tibble: 101 x 3
age female male
<int> <dbl> <dbl>
1 13 148. 55.0
2 14 224. 92.5
3 15 276. 106.
4 16 258. 136.
5 17 246. 125.
6 18 243. 122.
7 19 229. 130.
8 20 190. 112.
9 21 158. 108.
10 22 124. 97.0
# ... with 91 more rows
- 使用reshape2进行整理
reshape2 入门
install.packages('reshape2')
library(reshape2)
> pf.fc_by_age_gender.wide <- dcast(pf.fc_by_age_gender, age ~ gender,value.var="median_friend_count")
> head(pf.fc_by_age_gender.wide)
age female male
1 13 148.0 55.0
2 14 224.0 92.5
3 15 276.0 106.5
4 16 258.5 136.0
5 17 245.5 125.0
6 18 243.0 122.0
三、比例图
绘制比例图,并添加y=1的基准线
ggplot(aes(x=age,y=female/male),data=pf.fc_by_age_gender.wide)+
geom_line()+
geom_hline(yintercept = 1,alpha=0.3,linetype=2)