工作中常用的R语言函数（持续更新中……）

本文链接：https://blog.youkuaiyun.com/u010035907/article/details/55190629

本文介绍了如何使用R语言进行时间序列分析，包括日期数据生成、数据检查、构造时间特征及数据集合并等实用技巧。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1、日期数据生成

seq(as.Date("2015/12/14"),by="week", length.out=62) #按周增长

seq(as.Date("2015/12/14"),by="3 days", length.out=62) #按天增长

2、检查数据的函数

> str(tsdata_tmp)
'data.frame':        1116 obs.of 6 variables:
$ corname    : chr "日本" "日本" "日本" "日本"...
$ cityname   : chr "东京" "东京" "东京" "东京"...
$ date       : chr "2015-12-21" "2015-12-28" "2016-01-04""2016-01-11" ...
$ weeknum    : int 1 2 3 4 5 6 7 8 9 1 ...
$ ciiquantity: int 9386 8521 5224 7770 10610 12100 11413 1569510926 309 ...
$ y_stlf     : num 8593 8312 6515 7452 7965 ...
> str(advancedbooking_tmp)
'data.frame':        1539 obs.of 5 variables:
$ cityid     : int 228 228 228 228 228 228 228 228 228 228 ...
$ cityname   : chr "东京" "东京" "东京" "东京"...
$ date       : chr "2015/11/30" "2015/11/30" "2015/11/30""2015/11/30" ...
$ weeknum    : int 3 1 2 5 6 4 8 7 6 3 ...
$ ciiquantity: int 0 0 0 0 0 0 0 0 0 0 ...

3、R建模常用的构造时间特征的函数

library(lubridate)

DataSet$quarter<- quarter(DataSet$date)

DataSet$month<- month(DataSet$date)

DataSet$week <- week(DataSet$date) #一年的第几周

isoweek('2017-01-01') #一年的第几周

?lubridate::week #查看帮助

4、Merging Data

Adding Columns

Tomerge two data frames (datasets) horizontally, use the merge function. In mostcases, you join two data frames by one or more common key variables(i.e., an inner join).

# merge two dataframes by ID

total <-merge(data frameA,data frameB,by="ID") #by指定的列中的值必须是唯一的，不能重复出现两行有相同的ID

# merge two dataframes by ID and Country

total <-merge(data frameA,data frameB,by=c("ID","Country")) #by指定的列中的值必须是唯一的，不能重复出现两行有相同的ID

Inner join: merge(df1, df2) will work for these examples because R automatically joins theframes by common variable names, but you would most likely want to specify merge(df1, df2,by="CustomerId") tomake sure that you were matching on only the fields you desired. You canalso use the by.x and by.y parameters if the matching variables have differentnames in the different data frames.

Outer join: merge(x = df1, y = df2, by ="CustomerId", all = TRUE) #by指定的列中的值必须是唯一的，不能重复出现两行有相同的ID

Left outer: merge(x = df1, y = df2, by ="CustomerId", all.x=TRUE) #by指定的列中的值必须是唯一的，不能重复出现两行有相同的ID

Right outer: merge(x = df1, y = df2, by ="CustomerId", all.y=TRUE) #by指定的列中的值必须是唯一的，不能重复出现两行有相同的ID

Cross join: merge(x = df1, y = df2, by =NULL) #by指定的列中的值必须是唯一的，不能重复出现两行有相同的ID

4、数据框值更改

将数据框的某一列大于0的数，用同一行另一列的值替换，可以如下处理：

output_results[output_results$pred<0,][,'pred'] <-output_results[output_results$pred<0,][,'act_quantity'] #对负值进行处理