R语言回归篇



## S3 method for class 'formula'

pairs(formula, data = NULL, ..., subset,

      na.action = stats::na.pass)

 

## Default S3 method:

pairs(x, labels, panel = points, ...,

      lower.panel = panel, upper.panel = panel,

      diag.panel = NULL, text.panel = textPanel,

      label.pos = 0.5 + has.diag/3, line.main = 3,

      cex.labels = NULL, font.labels = 1,

      row1attop = TRUE, gap = 1, log = "")

Arguments

 

     

给定数据框的列来作图

 

formula   

指定需要作图的变量 ~ x + y + z. Each term will give a separate variable in the pairs plot, so terms should be numeric vectors. (A response will be interpreted as another variable, but not treated specially, so it is confusing to use one.)

 

data

a data.frame (or list) from which the variables in formula should be taken.

 

subset     

对某个列按要求提取数据

 

na.action

a function which indicates what should happen when the data contain NAs. The default is to pass missing values on to the panel functions, but na.action = na.omit will cause cases with missing values in any of the variables to be omitted entirely.

 

labels       

变量的名字

 

main

主标题为

 

panel       

function(x, y, ...) which is used to plot the contents of each panel of the display.

 

...    

arguments to be passed to or from methods.

 

Also, graphical parameters can be given as can arguments to plot such as main. par("oma") will be set appropriately unless specified.

 

lower.panel, upper.panel        

separate panel functions (or NULL) to be used below and above the diagonal respectively.

upper.panel=NULL  只生成下半三角的数据

 

diag.panel       

optional function(x, ...) to be applied on the diagonals.

 

text.panel        

optional function(x, y, labels, cex, font, ...) to be applied on the diagonals.

 

label.pos 

y position of labels in the text panel.

 

line.main

if main is specified, line.main gives the line argument to mtext() which draws the title. You may want to specify oma when changing line.main.

 

cex.labels, font.labels     

graphics parameters for the text panel.

 

row1attop       

logical. Should the layout be matrix-like with row 1 at the top, or graph-like with row 1 at the bottom?

 

gap 

distance between subplots, in margin lines.

 

log   

a character string indicating if logarithmic axes are to be used: see plot.default. log = "xy" specifies logarithmic axes for all variables.

 

Details

 

The ijth scatterplot contains x[,i] plotted against x[,j]. The scatterplot can be customised by setting panel functions to appear as something completely different. The off-diagonal panel functions are passed the appropriate columns of x as x and y: the diagonal panel function (if any) is passed a single column, and the text.panel function is passed a single (x, y) location and the column name. Setting some of these panel functions to NULL is equivalent to not drawing anything there.

 

The graphical parameters pch and col can be used to specify a vector of plotting symbols and colors to be used in the plots.

 

The graphical parameter oma will be set by pairs.default unless supplied as an argument.

 

A panel function should not attempt to start a new plot, but just plot within a given coordinate system: thus plot and boxplot are not panel functions.

 

By default, missing values are passed to the panel functions and will often be ignored within a panel. However, for the formula method and na.action = na.omit, all cases which contain a missing values for any of the variables are omitted completely (including when the scales are selected).

 

 

 

例子:

pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species",

      pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])

 

## formula method

pairs(~ Fertility + Education + Catholic, data = swiss,

      subset = Education < 20, main = "Swiss data, Education < 20")

提取Education < 20的数

 

 

pairs(USJudgeRatings, text.panel = NULL, upper.panel = NULL)

上三角不显示

 

 

## put histograms on the diagonal

panel.hist <- function(x, ...)

{

    usr <- par("usr"); on.exit(par(usr))

    par(usr = c(usr[1:2], 0, 1.5) )

    h <- hist(x, plot = FALSE)

    breaks <- h$breaks; nB <- length(breaks)

    y <- h$counts; y <- y/max(y)

    rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)

}

pairs(USJudgeRatings[1:5], panel = panel.smooth,

      cex = 1.5, pch = 24, bg = "light blue",

      diag.panel = panel.hist, cex.labels = 2, font.labels = 2)

 

## put (absolute) correlations on the upper panels,

## with size proportional to the correlations.

panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)

{

    usr <- par("usr"); on.exit(par(usr))

    par(usr = c(0, 1, 0, 1))

    r <- abs(cor(x, y))

    txt <- format(c(r, 0.123456789), digits = digits)[1]

    txt <- paste0(prefix, txt)

    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)

    text(0.5, 0.5, txt, cex = cex.cor * r)

}

pairs(USJudgeRatings, lower.panel = panel.smooth, upper.panel = panel.cor)

 

pairs(iris[-5], log = "xy") # plot all variables on log scale

pairs(iris, log = 1:4, # log the first four

      main = "Lengths and Widths in [log]", line.main=1.5, oma=c(2,2,3,2))

pairs(trackFeaturesTable[1:2], main="Edgar Anderson's Iris Data", pch=21,panel = panel.smooth,

#         diag.panel=panel.hist,bg = c("red", "green3", "yellow")[unclass(trackFeaturesTable$Species)])

library(car)
 scatterplotMatrix(~ DSpeed8 + Espeed8 | Species, data=trackFeaturesTable)


}

Scatterplot Matrices

Description

Scatterplot matrices with univariate displays down the diagonal; spm is an abbreviation for scatterplot.matrix. This function just sets up a call to pairs.

Usage

scatterplot.matrix(x, ...)

## S3 method for class 'formula':
scatterplot.matrix(formula, data=NULL, subset, ...)

## Default S3 method:
scatterplot.matrix(x, labels=colnames(x), 
    diagonal=c("density", "boxplot", "histogram", "oned", "qqplot", "none"), 
    adjust=1, nclass, plot.points=TRUE, smooth=TRUE, span=0.5, reg.line=lm, 
    transform=FALSE, ellipse=FALSE, levels=c(.5, .9), robust=FALSE,
    groups=FALSE, by.groups=FALSE, col=palette(), 
    pch=1:n.groups, lwd=1, lwd.smooth=lwd,
    cex=par("cex"), cex.axis=par("cex.axis"), cex.labels=NULL, 
    cex.main=par("cex.main"),
    legend.plot=length(levels(groups)) > 1, ...)

spm(x, ...)

Arguments

xa data matrix, numeric data frame, or formula.
formulaa one-side ``model'' formula, of the form ~ x1 + x2 + ... + xk or ~ x1 + x2 + ... + xk | z where z evaluates to a factor or other variable to divide the data into groups.
datafor scatterplot.matrix.formula, a data frame within which to evaluate the formula.
subsetexpression defining a subset of observations.
labelsvariable labels (for the diagonal of the plot).
diagonalcontents of the diagonal panels of the plot.
adjustrelative bandwidth for density estimate, passed to density function.
nclassnumber of bins for histogram, passed to hist function.
plot.pointsif TRUE the points are plotted in each off-diagonal panel.
smoothif TRUE a lowess smooth is plotted in each off-diagonal panel.
spanspan for lowess smoother.
reg.lineif not FALSE a line is plotted using the function given by this argument; e.g., using rlm in package MASS plots a robust-regression line.
transformif TRUE, multivariate normalizing Box-Cox transformations are computed and plotted; if a vector of powers, one for each variable, these are applied as Box-Cox power transformations prior to plotting.
ellipseif TRUE data-concentration ellipses are plotted in the off-diagonal panels.
levelslevels or levels at which concentration ellipses are plotted; the default is c(.5, .9).
robustif TRUE use the cov.trob function in the MASS package to calculate the center and covariance matrix for the data ellipse.
groupsa factor or other variable dividing the data into groups; groups are plotted with different colors and plotting characters.
by.groupsif TRUE, regression lines are fit by groups.
pchplotting characters for points; default is the plotting characters in order (see par).
colcolors for points and lines; the default is the in the current color palette, starting at the second entry (see palette and par).
lwdwidth of linear-regression lines (default 1).
lwd.smoothwidth for smooth regression lines (default is the same as lwd).
cex, cex.axis, cex.labels, cex.mainset sizes of various graphical elements; (see par).
legend.plotif TRUE then a legend for the groups is plotted in the bottom-right cell.
...arguments to pass down.

Scatterplot matrices in R

July 25, 2011
By  Stephen Turner

(This article was first published on  Getting Genetics Done, and kindly contributed to  R-bloggers)

I just discovered a handy function in R to produce a scatterplot matrix of selected variables in a dataset. The base graphics function is pairs(). Producing these plots can be helpful in exploring your data, especially using the second method below.

Try it out on the built in iris dataset. (data set gives the measurements in cm of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosaversicolor, andvirginica).

# Load the iris dataset.
data(iris)
 
# Plot #1: Basic scatterplot matrix of the four measurements
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris)

Looking at the pairs help page I found that there’s another built-in function, panel.smooth(), that can be used to plot a loess curve for each plot in a scatterplot matrix. Pass this function to the lower.panel argument of the pairs function. The panel.cor() function below can compute the absolute correlation between pairs of variables, and display these in the upper panels, with the font size proportional to the absolute value of the correlation

# panel.smooth function is built in.
# panel.cor puts correlation in upper panels, size proportional to correlation
panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- abs(cor(x, y))
    txt <- format(c(r, 0.123456789), digits=digits)[1]
    txt <- paste(prefix, txt, sep="")
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = cex.cor * r)
}
 
# Plot #2: same as above, but add loess smoother in lower and correlation in upper
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris,
      lower.panel=panel.smooth, upper.panel=panel.cor, 
      pch=20, main="Iris Scatterplot Matrix")

Finally, you can produce a similar plot using ggplot2, with the diagonal showing the kernel density.

# Plot #3: similar plot using ggplot2
# install.packages("ggplot2") ## uncomment to install ggplot2
library(ggplot2)
plotmatrix(with(iris, data.frame(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)))



## S3 method for class 'formula'

pairs(formula, data = NULL, ..., subset,

      na.action = stats::na.pass)

 

## Default S3 method:

pairs(x, labels, panel = points, ...,

      lower.panel = panel, upper.panel = panel,

      diag.panel = NULL, text.panel = textPanel,

      label.pos = 0.5 + has.diag/3, line.main = 3,

      cex.labels = NULL, font.labels = 1,

      row1attop = TRUE, gap = 1, log = "")

Arguments

 

     

给定数据框的列来作图

 

formula   

指定需要作图的变量 ~ x + y + z. Each term will give a separate variable in the pairs plot, so terms should be numeric vectors. (A response will be interpreted as another variable, but not treated specially, so it is confusing to use one.)

 

data

a data.frame (or list) from which the variables in formula should be taken.

 

subset     

对某个列按要求提取数据

 

na.action

a function which indicates what should happen when the data contain NAs. The default is to pass missing values on to the panel functions, but na.action = na.omit will cause cases with missing values in any of the variables to be omitted entirely.

 

labels       

变量的名字

 

main

主标题为

 

panel       

function(x, y, ...) which is used to plot the contents of each panel of the display.

 

...    

arguments to be passed to or from methods.

 

Also, graphical parameters can be given as can arguments to plot such as main. par("oma") will be set appropriately unless specified.

 

lower.panel, upper.panel        

separate panel functions (or NULL) to be used below and above the diagonal respectively.

upper.panel=NULL  只生成下半三角的数据

 

diag.panel       

optional function(x, ...) to be applied on the diagonals.

 

text.panel        

optional function(x, y, labels, cex, font, ...) to be applied on the diagonals.

 

label.pos 

y position of labels in the text panel.

 

line.main

if main is specified, line.main gives the line argument to mtext() which draws the title. You may want to specify oma when changing line.main.

 

cex.labels, font.labels     

graphics parameters for the text panel.

 

row1attop       

logical. Should the layout be matrix-like with row 1 at the top, or graph-like with row 1 at the bottom?

 

gap 

distance between subplots, in margin lines.

 

log   

a character string indicating if logarithmic axes are to be used: see plot.default. log = "xy" specifies logarithmic axes for all variables.

 

Details

 

The ijth scatterplot contains x[,i] plotted against x[,j]. The scatterplot can be customised by setting panel functions to appear as something completely different. The off-diagonal panel functions are passed the appropriate columns of x as x and y: the diagonal panel function (if any) is passed a single column, and the text.panel function is passed a single (x, y) location and the column name. Setting some of these panel functions to NULL is equivalent to not drawing anything there.

 

The graphical parameters pch and col can be used to specify a vector of plotting symbols and colors to be used in the plots.

 

The graphical parameter oma will be set by pairs.default unless supplied as an argument.

 

A panel function should not attempt to start a new plot, but just plot within a given coordinate system: thus plot and boxplot are not panel functions.

 

By default, missing values are passed to the panel functions and will often be ignored within a panel. However, for the formula method and na.action = na.omit, all cases which contain a missing values for any of the variables are omitted completely (including when the scales are selected).

 

 

 

例子:

pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species",

      pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])

 

## formula method

pairs(~ Fertility + Education + Catholic, data = swiss,

      subset = Education < 20, main = "Swiss data, Education < 20")

提取Education < 20的数

 

 

pairs(USJudgeRatings, text.panel = NULL, upper.panel = NULL)

上三角不显示

 

 

## put histograms on the diagonal

panel.hist <- function(x, ...)

{

    usr <- par("usr"); on.exit(par(usr))

    par(usr = c(usr[1:2], 0, 1.5) )

    h <- hist(x, plot = FALSE)

    breaks <- h$breaks; nB <- length(breaks)

    y <- h$counts; y <- y/max(y)

    rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)

}

pairs(USJudgeRatings[1:5], panel = panel.smooth,

      cex = 1.5, pch = 24, bg = "light blue",

      diag.panel = panel.hist, cex.labels = 2, font.labels = 2)

 

## put (absolute) correlations on the upper panels,

## with size proportional to the correlations.

panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)

{

    usr <- par("usr"); on.exit(par(usr))

    par(usr = c(0, 1, 0, 1))

    r <- abs(cor(x, y))

    txt <- format(c(r, 0.123456789), digits = digits)[1]

    txt <- paste0(prefix, txt)

    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)

    text(0.5, 0.5, txt, cex = cex.cor * r)

}

pairs(USJudgeRatings, lower.panel = panel.smooth, upper.panel = panel.cor)

 

pairs(iris[-5], log = "xy") # plot all variables on log scale

pairs(iris, log = 1:4, # log the first four

      main = "Lengths and Widths in [log]", line.main=1.5, oma=c(2,2,3,2))

pairs(trackFeaturesTable[1:2], main="Edgar Anderson's Iris Data", pch=21,panel = panel.smooth,

#         diag.panel=panel.hist,bg = c("red", "green3", "yellow")[unclass(trackFeaturesTable$Species)])

library(car)
 scatterplotMatrix(~ DSpeed8 + Espeed8 | Species, data=trackFeaturesTable)


}

Scatterplot Matrices

Description

Scatterplot matrices with univariate displays down the diagonal; spm is an abbreviation for scatterplot.matrix. This function just sets up a call to pairs.

Usage

scatterplot.matrix(x, ...)

## S3 method for class 'formula':
scatterplot.matrix(formula, data=NULL, subset, ...)

## Default S3 method:
scatterplot.matrix(x, labels=colnames(x), 
    diagonal=c("density", "boxplot", "histogram", "oned", "qqplot", "none"), 
    adjust=1, nclass, plot.points=TRUE, smooth=TRUE, span=0.5, reg.line=lm, 
    transform=FALSE, ellipse=FALSE, levels=c(.5, .9), robust=FALSE,
    groups=FALSE, by.groups=FALSE, col=palette(), 
    pch=1:n.groups, lwd=1, lwd.smooth=lwd,
    cex=par("cex"), cex.axis=par("cex.axis"), cex.labels=NULL, 
    cex.main=par("cex.main"),
    legend.plot=length(levels(groups)) > 1, ...)

spm(x, ...)

Arguments

xa data matrix, numeric data frame, or formula.
formulaa one-side ``model'' formula, of the form ~ x1 + x2 + ... + xk or ~ x1 + x2 + ... + xk | z where z evaluates to a factor or other variable to divide the data into groups.
datafor scatterplot.matrix.formula, a data frame within which to evaluate the formula.
subsetexpression defining a subset of observations.
labelsvariable labels (for the diagonal of the plot).
diagonalcontents of the diagonal panels of the plot.
adjustrelative bandwidth for density estimate, passed to density function.
nclassnumber of bins for histogram, passed to hist function.
plot.pointsif TRUE the points are plotted in each off-diagonal panel.
smoothif TRUE a lowess smooth is plotted in each off-diagonal panel.
spanspan for lowess smoother.
reg.lineif not FALSE a line is plotted using the function given by this argument; e.g., using rlm in package MASS plots a robust-regression line.
transformif TRUE, multivariate normalizing Box-Cox transformations are computed and plotted; if a vector of powers, one for each variable, these are applied as Box-Cox power transformations prior to plotting.
ellipseif TRUE data-concentration ellipses are plotted in the off-diagonal panels.
levelslevels or levels at which concentration ellipses are plotted; the default is c(.5, .9).
robustif TRUE use the cov.trob function in the MASS package to calculate the center and covariance matrix for the data ellipse.
groupsa factor or other variable dividing the data into groups; groups are plotted with different colors and plotting characters.
by.groupsif TRUE, regression lines are fit by groups.
pchplotting characters for points; default is the plotting characters in order (see par).
colcolors for points and lines; the default is the in the current color palette, starting at the second entry (see palette and par).
lwdwidth of linear-regression lines (default 1).
lwd.smoothwidth for smooth regression lines (default is the same as lwd).
cex, cex.axis, cex.labels, cex.mainset sizes of various graphical elements; (see par).
legend.plotif TRUE then a legend for the groups is plotted in the bottom-right cell.
...arguments to pass down.

Scatterplot matrices in R

July 25, 2011
By  Stephen Turner

(This article was first published on  Getting Genetics Done, and kindly contributed to  R-bloggers)

I just discovered a handy function in R to produce a scatterplot matrix of selected variables in a dataset. The base graphics function is pairs(). Producing these plots can be helpful in exploring your data, especially using the second method below.

Try it out on the built in iris dataset. (data set gives the measurements in cm of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosaversicolor, andvirginica).

# Load the iris dataset.
data(iris)
 
# Plot #1: Basic scatterplot matrix of the four measurements
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris)

Looking at the pairs help page I found that there’s another built-in function, panel.smooth(), that can be used to plot a loess curve for each plot in a scatterplot matrix. Pass this function to the lower.panel argument of the pairs function. The panel.cor() function below can compute the absolute correlation between pairs of variables, and display these in the upper panels, with the font size proportional to the absolute value of the correlation

# panel.smooth function is built in.
# panel.cor puts correlation in upper panels, size proportional to correlation
panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- abs(cor(x, y))
    txt <- format(c(r, 0.123456789), digits=digits)[1]
    txt <- paste(prefix, txt, sep="")
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = cex.cor * r)
}
 
# Plot #2: same as above, but add loess smoother in lower and correlation in upper
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris,
      lower.panel=panel.smooth, upper.panel=panel.cor, 
      pch=20, main="Iris Scatterplot Matrix")

Finally, you can produce a similar plot using ggplot2, with the diagonal showing the kernel density.

# Plot #3: similar plot using ggplot2
# install.packages("ggplot2") ## uncomment to install ggplot2
library(ggplot2)
plotmatrix(with(iris, data.frame(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)))



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值