Time Series Clustering and Classification

本文介绍如何使用R语言进行时间序列聚类和分类,通过动态时间规整(DTW)计算距离矩阵,并应用层次聚类方法;同时,展示如何使用离散小波变换(DWT)提取特征并构建决策树模型进行时间序列分类。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

http://www.rdatamining.com/examples/time-series-clustering-classification

Time Series Clustering and Classification

This page shows R code examples on time series clustering and classification with R.

Time Series Clustering

Time series clustering is to partition time series data into groups based on similarity or distance, so that time series in the same cluster are similar. For time series clustering with R, the first step is to work out an appropriate distance/similarity metric, and then, at the second step, use existing clustering techniques, such as k-means, hierarchical clustering, density-based clustering or subspace clustering, to find clustering structures.

Dynamic Time Warping (DTW) finds optimal alignment between two time series, and DTW distance is used as a distance metric in the example below. 

A data set of Synthetic Control Chart Time Series is used here, which contains 600 examples of control charts. Each control chart is a time series with 60 values. There are six classes: 1) 1-100 Normal, 2) 101-200 Cyclic, 3) 201-300 Increasing trend, 4)301-400 Decreasing trend, 5) 401-500 Upward shift, and 6) 501-600 Downward shift. The dataset is downloadable at UCI KDD Archive.

> sc <- read.table(“E:/Rtmp/synthetic_control.data”, header=F, sep=”")

# randomly sampled n cases from each class, to make it easy for plotting

> n <- 10

> s <- sample(1:100, n)

> idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s)

> sample2 <- sc[idx,]

> observedLabels <- c(rep(1,n), rep(2,n), rep(3,n), rep(4,n), rep(5,n), rep(6,n))

# compute DTW distances

> library(dtw)

> distMatrix <- dist(sample2, method=”DTW”)

# hierarchical clustering

> hc <- hclust(distMatrix, method=”average”)

> plot(hc, labels=observedLabels, main=”")


 

Time Series Classification

Time series classification is to build a classification model based on labelled time series and then use the model to predict the label of unlabelled time series. The way for time series classification with R is to extract and build features from time series data first, and then apply existing classification techniques, such as SVM, k-NN, neural networks, regression and decision trees, to the feature set.

Discrete Wavelet Transform (DWT) provides a multi-resolution representation using wavelets and is used in the example below. Another popular feature extraction technique is Discrete Fourier Transform (DFT).

# extracting DWT coefficients (with Haar filter)

> library(wavelets)

> wtData <- NULL

> for (i in 1:nrow(sc)) {

+  a <- t(sc[i,])

+  wt <- dwt(a, filter=”haar”, boundary=”periodic”)

+  wtData <- rbind(wtData, unlist(c(wt@W,wt@V[[wt@level]])))

+ }

> wtData <- as.data.frame(wtData)

 

# set class labels into categorical values

> classId <- c(rep(“1″,100), rep(“2″,100), rep(“3″,100),

+  rep(“4″,100), rep(“5″,100), rep(“6″,100))

> wtSc <- data.frame(cbind(classId, wtData))

 

# build a decision tree with ctree() in package party

> library(party)

> ct <- ctree(classId ~ ., data=wtSc,

+  controls = ctree_control(minsplit=30, minbucket=10, maxdepth=5))

> pClassId <- predict(ct)

 

# check predicted classes against original class labels

> table(classId, pClassId)

      

# accuracy

> (sum(classId==pClassId)) / nrow(wtSc)

[1] 0.8716667

 

> plot(ct, ip_args=list(pval=FALSE), ep_args=list(digits=0))


More examples on time series analysis and mining with R and other data mining techniques can be found in my book " R and Data Mining: Examples and Case Studies", which is downloadable as a .PDF file at the link.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值