因子(factor)
- 分类数据/有序与无序
- 整数向量+标签(优与整数向量)
> x<-factor(c("female","female","male","male")) > x [1] female female male male Levels: female male x<-factor(c("female","female","male","male"),levels=c("male","female"))#levels谁在前谁就体现基线水平 [1] female female male male Levels: male female > table(x) #对当前因子有个整体的了解 x male female 2 2 > unclass(x) #去掉因子的属性 [1] 2 2 1 1 attr(,"levels") [1] "male" "female" > class(unclass(x)) [1] "integer"
缺失值
-
NA(数字,字符等)/NaN(数字的缺失值):NaN属于NA,NA不属于NaN
-
is.na()/is.nan()
-
> m<-c(1,NA,2,NA,3) > m [1] 1 NA 2 NA 3 > is.na(m) [1] FALSE TRUE FALSE TRUE FALSE > is.nan(m) [1] FALSE FALSE FALSE FALSE FALSE
数据框
-
存储表格数据
-
设为各元素长度相同的列表
-
每个元素列表-列数据
-
每个元素的长度代表行数
元素类型可以不同
> df <-data.frame(id=c(1,2,3,4),name=c("a","b","c","d"))
> df
id name
1 1 a
2 2 b
3 3 c
4 4 d
> nrow(df)
[1] 4
> ncol(df)
[1] 2
> data.matrix(df) #转换为矩阵
id name
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 4
日期与时间
日期:Date
> dt<-date()
> dt
[1] "Fri Feb 28 11:25:28 2020"
> class(dt)
[1] "character"
> dt <-Sys.Date()
> dt
[1] "2020-02-28"
> class(dt)
[1] "Date"
> x3 <- as.Date("2015-01-01") #存储日期
> x3
[1] "2015-01-01"
> weekdays(x3)
[1] "Thursday"
> months(x3)
[1] "January"
> quarters(x3)
[1] "Q1"
> julian(x3)
[1] 16436
attr(,"origin")
[1] "1970-01-01"
x3 <- as.Date("2015-01-01")
x4 <- as.Date("2016-01-01")
x4-x3
Time difference of 365 days
> as.numeric(x4-x3) #强制转换
[1] 365
时间:POSIXct/POSIXlt
距离1970-01-01的秒数/Sys.time()
POSIXct:整数,常用语存入数据框
POSIXlt:列表,还包含星期,年,月,日等信息
> x5 <- Sys.time()
> x5
[1] "2020-02-28 11:34:31 CST"
> class(x5)
[1] "POSIXct" "POSIXt"
> p<-as.POSIXlt(x5)
> p
[1] "2020-02-28 11:34:31 CST"
> class(p)
[1] "POSIXlt" "POSIXt"
> names(unclass(p)) #获取属性名称
[1] "sec" "min" "hour" "mday" "mon" "year" "wday"
[8] "yday" "isdst" "zone" "gmtoff"
> p$sec #查看变量的值
[1] 31.83603
> p$yday
[1] 58
> as.POSIXct(p)
[1] "2020-02-28 11:34:31 CST"
> x6 <- "Jan1,2015 01:01"
[1] "Jan1,2015 01:01"
> strptime(x6,"%B %d, %Y %H:%M")
[1] "2015-01-01 01:01:00 CST"