一、因子
1、名义型:例如:性别,省份、职业等
2、有序型:优良中及格不及格等有序量度
> x <- c("男", "女", "男", "女", "女","男")
> sex <- factor(x)
> sex
[1] 男 女 男 女 女 男
Levels: 男 女
> mode(sex)#数据类型
[1] "numeric"
> class(sex)#数据类
[1] "factor"
> as.numeric(sex)
[1] 1 2 1 2 2 1
> x1<- c("excellent","average","good","poor","average",'good')
> x2<-factor(x1)
> x2
[1] excellent average good poor
[5] average good
Levels: average excellent good poor
> x3<-as.numeric(x2)#转换成数值
> x3
[1] 2 1 3 4 1 3
> x4<- factor(x2, order = TRUE, levels = c("excellent","good","average","poor"))#自定义排序,ordert=TRUE,表示有序因子,level()自定义顺序
>x4
[1] excellent average good poor
[5] average good
4 Levels: excellent < good < ... < poor
二、table()函数
用table()函数统计因子各水平的出现次数(称为频数或频率)
> x <- c("男", "女", "男", "女", "女","男")
> sex <- factor(x)
> table(sex)
sex
男 女
3 3
三、tapply()函数
> x <- c("男", "女", "男", "女", "女","男")
> sex <- factor(x)
>height <- c(165, 170, 168, 172, 159,175)
>tapply(height, sex, mean)
男 女
169.3333 167.0000