因子(factor)
类别属性,只有有限数量的值。
The term factor refers to a statistical data type used to store categorical variables. The difference between a categorical variable and a continuous variable is that a categorical variable can belong to a limited number of categories.
# Sex vector sex_vector <- c("Male", "Female", "Female", "Male", "Male") # Convert sex_vector to a factor factor_sex_vector <- factor(sex_vector) # Print out factor_sex_vector factor_sex_vector |
改变factor的值:
# Code to build factor_survey_vector survey_vector <- c("M", "F", "F", "M", "M") factor_survey_vector <- factor(survey_vector) # Specify the levels of factor_survey_vector levels(factor_survey_vector) <- c("Female", "Male") |
注意:要按factor的顺序。
levels(factor_survey_vector)查看。
汇总信息查看:
# Build factor_survey_vector with clean levels survey_vector <- c("M", "F", "F", "M", "M") factor_survey_vector <- factor(survey_vector) levels(factor_survey_vector) <- c("Female", "Male") factor_survey_vector # Generate summary for survey_vector summary(survey_vector) # Generate summary for factor_survey_vector summary(factor_survey_vector) |
结果
> summary(survey_vector) Length Class Mode 5 character character > summary(factor_survey_vector) Female Male 2 3 |
注:普通factor不可比较。
R returns NA when you try to compare values in a factor, since the idea doesn't make sense.
有序的factor,vector元素可比较。
# Create speed_vector speed_vector <- c("medium", "slow", "slow", "medium", "fast") # Convert speed_vector to ordered factor vector factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "medium", "fast")) |
结果:
> factor_speed_vector [1] medium slow slow medium fast Levels: slow < medium < fast > summary(factor_speed_vector) slow medium fast 2 2 1 |
比较元素:
> flg <- factor_speed_vector[1] > factor_speed_vector[2] > flg [1] TRUE |