R-Modeling(step 4)

本文介绍如何使用R语言实现统计学习中的关键方法,包括线性回归、逻辑回归、决策树、随机森林和支持向量机等监督学习算法,以及K均值、层次聚类等无监督学习算法。此外,还介绍了时间序列分析的方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

[I]Regression

OLSregression

DescriptionFunction
simple linear regressionlm(Y~X1,data)
polynomial regressionlm(Y~X1+...+I(X^2),data)
multiple linear regressionlm(Y~X1+X2+...+Xk)
multiple linear regression with interaction termslm(Y~X1+X2+X1:X2)

selection the optimal regression model

FunctionDescription
anova(fit1,fit2)nested model fit
AIC(fit1,fit2)statistical fit
stepAIC(fit,direction=)stepwise method:"forward""backward"
regsubsets()all-subsets regression

test

FunctionDescription
plot()plot
qqplot()quantile comparison
durbinWatsonTest()Durbin-Waston
crPlots()component and residual
ncvTest()score test of non-constant error variance
spreadLevelPlot()dispersion level
outlierTest()Bonferroni outlier
avPlots()added variable
inluencePlot()regression
scatterplot()enhanced scatter plot
scatterplotMatrix()enhanced scatter plot matrix
vif()variance expansion factor

abnormal

  • abnormal observation

1.outlier:outlierTest()
2.high leverage point:hat.plot()
3.strong influence point:Cook's D

  • improvement measures

1.delete observation point
2.variable transformation
3.addition and deletion of variables

Generalized Linear

generalized linear

  • glm()
    Distribution Family | Default Connection Function
binomial(link = "logit")
gaussian(link = "identity")
gamma(link = "inverse")
inverse.gaussian(link = "1/mu^2")
poisson(link = "log")
quasi(link = "identity",variance = "constant")
quasibinomial(link = "logit")
quasipoisson(link = "log")
  • function with glm()
    Function | Description
summary()show details of the fitting model
coefficients()/coef()list the parameters of the fitting model
confint()give a confidence interval for the model paramenters
residuals()list residual value of the fitting model
anova()generate an analysis table of variance for two fitting model
plot()generate a diagnostic map of the evaluation fitting model
predict()forecast new datasets with a fitting model
deviance()the deviation of the fitting model
df.residual()the residual freedom of the fitting model

logistic

> data(Affairs,package="ARE")
> summary(Affairs)
> table(Affair$affairs)
> Affairs$ynaffair[Affairs$affairs > 0] <-1
> Affairs$ynaffair[Affairs$affairs == 0] <-0
> Affairs$naffair <- factor(Affairs$ynaffair),levels=c(0,1),labels=c("No","Yes"))
> table(Affairs$ynaffair)
> fit.full <-glm(ynaffair~gender + age + yearmarried +children + 
             religiousness + education + occupation + rating,family = binomial()
             data = Affairs)
> fit.reduced <-glm(ynaffair ~ age + yearmarried + religiousness +rating,
                      family = binomial(),data = Affairs)
> anova(fit.reduced,fit.full,test="Chisq")
> coef(fit.reduced)
> exp(coef(fit.reduced))
> testdata<-data.frame(rating=c(1,2,3,4,5),age=mean(Affair$age),
                                   yearsmarried=mean(Affairs$yearsmsrraied),
                                   religiousness=mean(Affairs$religiousness))
> teatdata
> testdata$prob <- predict(fit.reduced,newdata=testdata,type="response")
> testdata <- data.frame(rating = mean(Affair$rating),
                                    age=seq(17,57,10),
                                    yearsmarried=mean(Affair$yearsmarried),
                                    religiousness=mean(Affairs$religiousness))
> testdata
> testdata$prob<-predict(fit.reduced,newdata=testdata,type="response")
> testdata
> deviance(fit.reduced)/df.residual(fit.reduced)
> fit <- glm(ynaffair ~ age + yearmarried + religiousness + rating,
                 family = quasibinomial(),data=Affairs)
> pchisq(summary(fit.od)$dispersion * fit$df.residual,
             fit$df.residual,lower=F)

poisson

> data(breslow.dat,package="robust")
> name(breslow.dat)
> summary(breslow.dat[c(6,7,8,10)])
> opar<-par(no.readonly=TRUE)
> par(mfrow=c(1,2))
> attach(breakslow.dat)
> hist(sumY,breaks=20,xlab="Seizure Count",main="Distribution of Seizures")
> boxplot(sumY~Trt,xlab="Treatment",main="Group Comparisons")
> par(opar)
> fit<-glm(sumY~Base + Age +Trt,data=breslow.dat,family=poisson())
> summary(fit)
> coef(fit)
> exp(coef(fit))
> deviance(fit)/df.residual(fit)
> library(qcc)
> qcc.overdispersion.test(breslow.dat$sumY,type="poisson")
> fit.od<-glm(sumY~Base + Age + Trt,data=breslow.dat,
       family=quassipossion())
> summary(fit.od)
> fit glm(sumY~Base + Age +Trt,data=breslow.dat,offset=log(time),family=poisson)

[II]Cluster

1.cluster analysis step

1.choose the right variable
2.scale data
3.looking for anomalies
4.calculated distance
5.selection clustering algorithm
6.obtain one or more clustering methods
7.determine the number of classes
8.get the ultimate clustering solution
9.visualization of results
10.interpretation class
11.validation results

2.calculated distance

> data(nurtrient,package="flexclust")
> head(nutrient,4)
> d<-dist(nutrient)
> as.martrix(d)[1:4,1:4]

3.hierarchical clustering analysis

> data(nutrient,package="flexclust")
> row.name(nutrient) <-tolower(row.names(nutrient))
> nutrient.scaled<-scale(nutrient)

> d<-dist(nutrient.scaled)

> fit.average <-hclust(d,method="average")
> plot(fit.average,hang=-1,cex=.8,main="Average Linkage Clustering")
> library(Nbcluster)
> devAskNewPage(ask=TRUE)
> nc<-NbClust(nutrient,scaled,distance="euclidean",
                       min.nc=2,max.nc=15,method="average")
> table(nc$Best.n[1,])
> barplot(table(nc$Best,n[1,]),
              xlab="Number of Clusters",ylab="Number of Criteria",
              main="Number of Cluster Chosen by 26 Criteria")  
> clusters<-cutree(fit.average,k=5)
> table(clusters)
> aggregate(nutrient,by=list(cluster=clusters),median)
> aggregate(as.data.frame(nutrient.scaled),bu=list(cluster=clusters),median)
> plot(fit.average,hang=-1,cex=.8,
         main="Average Linkage Cluster\n5 Cluster Solution")
> rect.hclust(fit.average,k=5)

4.Clustering analysis

  • K-means clustering
> wssplot<-function(data,nc=15,seed=1234)(
> head(wine)
> df<-scale(wine[-1])
> wssplot(df)
> library(NcClust)
> set.seed(1234)
> devAskNewPage(ask=TRUE)
> nc<-NbClust(df,min.nc=2,max.nc=15,method="kmeans")
> table(nc$Best.n[1,)
> barplot(table(nc$Best,n[1,]),
              xlab="Number of Clusters",ylab="Number of Criteria",
              main="Number of Clusters Chosen by 26 Criteria")
> set.seed(1234)
> fit.km<-kmeans(df,3,nstart=25)
> fit.km$size
> fit.km$centers
> aggregate(wine[-1],by=list(cluster=fit.km$cluster),mean)
  • Division around the center point
> library(cluster)
> set.seed(1234)
> fit.pam<-pam(wine[-1],k=3,stand=TRUE)
> fit.pam$medoids
> clusplot(fit.pam,main="Bivariate Cluster Plot")
> ct.pam<-table(wine$Type,fit.pam$clustering)
> randIndex(ct.pam)

5.avoid non-existing classes

> library(fMultivar)
> set.seed(1234)
> df<-rnom2d(1000,rho=.5)
> df<-as.data.frame(df)
> plot(df,main="Bivariate Normal Distribution with rho=0.5")
> wssplot(df)
> library(NbClust)
> nc<-NbClust(df,min.nc=2,max.nc=15,method="kmean")
> dev.new()
> barplot(table(nc$Best,n[1,]),
              xlab="Number of Clusters",ylab="Number of Criteria",
              main="Number of Clusters Chosen by 26 Criteria")
> library(ggplot2)
> library(cluster)
> fit<-pam(df,k=2)
> df$clustering<-factor(fit$clustering)
> ggplot(data=df,aes(x=V1,y=V2,color=clustering,shape=clustering)) +
             geom_point() + ggtitle("Clustering of Bivariate Normal Data")
> plot(nc$All,index[,4],type="o",ylab="CCC",xlab="Number of clusters",col="blue")

[III]Classification

data preparation

> loc<-"http://archive.ics.uci.edu/ml/machine-learning-databases/"
> ds<-"breast-cancer-wisconsin/breast-cancer-wisconsin.data"
> url<-paste(loc,ds,sep="")
> breast<-read.table(url,sep=",",header=FALSE,na.string="?")
> names(breast)<-c("ID","clumpThickness","sizeUniformity",
                              "shapeUniformity","maginalAdhesion",
                              "singleEpitheliaCellSize","bareNuclei",
                              "blandChromatin","normalNucleoli","mitosis","class")
> df<-breast[-1]
> df$class<-factor(df$class,levels=c(2,4),
                           labels=c("benign","malignant"))
> set.seed(1234)
> train<-sample(nrow(df),0.7*nrow(df))
> df.train<-df[train,]
> df.validate<-df[-train,]
> table(df.train$class)
> table(df.validate$class)

logistic regression

> fit.logit<-glm(class~.,data=df.train,family=binomial())
> summary(fit.logit)
> prob<-predict(fit.logit,df.validate,type="response")
> logit.perd<-factor(prob>.5,levels=c(FALSE,TRUE),
                             labels=c("benign","malignant"))
> logit.perf<-table(df.validate$class,logit.pred,
                            dnn=c("Actual","Predicted"))
> logit.perf

decision tree

  • classic decision tree
> library(repart)
> set.seed(1234)
> dtree<-repart(class~.,data=df.train,method="class",
                       parms=list(split="information"))
> dtree$cptable
> plotcp(dtree)
> dtree.pruned<-prune(dtree,cp=.0125)
> library(repart.plot)
> prp(dtree.pruned,type=2,extra=104,
        fallen.leaves=TRUE,main="Decesion Tree")
> dtree.pred<-predict(dtree.pruned,df.validate,type="class")
> dtree.perf<-table(df.validate$class,dtree.pred,
                             dnn=c("Actual","Predicted"))
> dtree.perf
  • conditional inference tree
> library(party)
> fit.ctree<-ctree(class~.,data=sf.train)
> plot(fit.ctree,main="Conditional Inference Tree")
> ctree.pred<-predict(fit.ctree,df.validate,type="response")
> ctree.perf<-table(df.validate$class,ctree.pred
                            dnn=c("Actual","Predicted"))
> stree.perf

random forest

> library(randomForest)
> set.seed(1234)
> fit.forest<-randomForest(class~.,data=df.train,na.action=na.roughfix,importance=TRUE)
> fit.forest
> importance(fit.forest,type=2)
> forest.pred<-predict(fit.forest,df.validate)
> forest.perf<-table(df.validate$class,forest.pred,
                              dnn=c("Actual","Predicted"))
> forest.perf

support vector machines

  • svm
> library(e1071)
> set.seed(1234)
> fit.svm<-svm(class~.,data=df.train)
> fit.svm
> svm,pred<-predict(fit.svm,na.omit(df.validate))
> svm.perf<-table(na.omit(df.validate)$class,
                           svm,pred,dnn=c("Actual","Predicted"))
> svm.perf
  • svm model with rbf core
> set.seed(1234)
> tuned<-tune.svm(class~.,data=df.train,gamma=10^(-6:1),cost=10^(-10:10))
> turned
> fit.svm<-svm(class~.,data=df.train,gamma=.01,cost=1)
> svm.pred<-predict(fit.svm,na.omit(df,validate))
> svm.perf<-table(na.omit(df.validate)$class,
                            svm.pred,dnn=c("Actual","Predicted"))
> svm.perf

choose the best solution for forecasting

> performance<-function(table,n=2){
        if(!all(dim(table)==c(2,2)))
              stop("Must be a 2 × 2 table")
   tn=table[1,1]
   fp=table[1,2]
   fn=table[2,1]
   tp=table[2,2]
   sensitivity=tp/(tp+fn)
   specificity=tn/(tn+fp)
   ppp=tp/(tp+fp)
   npp=tn/(tn+fn)
   hitrate=(tp+tn)/(tp+tn+fp+fn)
   result<-paste("Sensitivity=",round(ppp,n),
           "\nSpecificity=",round(specificity,n),
           "\nPositive Predictive Value=",round(ppp,n),
           "\nNegative Predictive Value=",round(npp,n),
           "\nAccuracy=",round(hitrate,n),"\n",seq="")
   cat(result)
  }

data mining with the rattle package

> install.packages("rattle")
> rattle()
> loc<-"http://archive.ics.uci.edu/ml/machine-learning-databases/"
> ds<-"pima-indians-diabetes/pima-indians-diabetes.data"
> url <-paste(loc,ds,sep="")
> diabetes<-read.table(url,seq=",",header=FALSE)
> names(diabetes)<-c("npregant","plasma","bp","triceps",
                                  "insulin","bmi","pedigree","age","class")
> diabetes$class<-factor(diabetes$class,levels=c(0,1),
                                      labels=c("normal","diabetic"))
> library(rattle)
> rattle()
> cv<-matrix(c(145,50,8,27),nrow=2)
> performance(as.table(cv))

[IV]Time Series

FunctionPackageDescription
ts()statsgenerate timing objects
plot()graphicsdraw a line graph of the time series
start()statsreturn the start time of the time series
end()statsreturn the end time of the time series
frequency()statsreturn the number of time points of the time series
windows()statssubset a sequence object
ma()forecastfit a simple moving average model
stl()statsuse LOESS smooth to decompose timing into seasonal terms,trend terms and random terms
monthplot()statsdraw seasonal terms in time series
seasonplot() forecastgenerate seasonal plot
HoltWinters()ststsfit exponential smoothing model
forecast()forecastpredicting the future value of the timing
accuracy()forecastreturn time-of-fit goodness variable
ets()forecastfit exponential smoothing model and automatically select the optimal model
lag()statsreturn the timing after the specified hysteresis is taken
Acf()forecastestimated autocorrelation function
Pacf()forecastestimated partial autocorrelation function
diff()basereturn the sequence after the lag term or difference
ndiffs()forecastfind the optimal difference count to remove the trend terms in the sequence
adf.test()tseriesperform an ADF test on the sequence to determine if it is stable
arimastatsfit the ARIMA model
Box.test()ststsperform a Ljung-Box test to determine if the model's residuals are independent
bds.test()teeriesperform a BDS test to determine whether random variables in the sequence
are subject to independent and identical distribution
auto.arimaforecastautomatically select the ARIMA model

END!

### 谐振转换器的小信号建模技术 对于谐振转换器而言,精确的小信号模型能够帮助工程师理解系统的动态行为并优化控制器设计。为了建立这样的模型,通常采用状态空间平均法来描述电路的行为。 #### 状态空间平均法 这种方法通过将开关周期内的非线性方程组转化为连续时间下的线性微分方程来进行分析。这使得可以应用经典的控制理论工具去研究稳定性、瞬态响应和其他性能指标[^1]。 ```matlab % MATLAB code snippet to demonstrate state-space averaging technique A = [0, 1; -w^2/L, -R/L]; % State matrix A (example values w=angular frequency, L=inductance, R=resistance) B = [0; 1/L]; % Input matrix B C = eye(2); % Output matrix C D = zeros(2,1); % Direct transmission term D sys = ss(A,B,C,D); step(sys) % Plot step response of the system ``` #### AC小扰动分析 除了上述的方法外,在稳态工作点附近引入交流小扰动也是构建小信号模型的一种常见手段。此过程涉及对直流偏置点附近的变量施加一个小幅度正弦波形,并观察其影响以提取增益和相位特性[^2]。 #### 参考资源推荐 针对希望深入了解该主题的研究人员和技术爱好者,《Power Electronics Handbook》提供了详尽章节专门讨论不同类型的谐振拓扑及其对应的数学表达形式;而《Fundamentals of Power Electronics》则更侧重于基础概念的教学以及实际案例的应用讲解[^3]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值