R中散点图矩阵绘制技巧-优快云博客

## S3 method for class 'formula'

pairs(formula, data = NULL, ..., subset,

na.action = stats::na.pass)

## Default S3 method:

pairs(x, labels, panel = points, ...,

lower.panel = panel, upper.panel = panel,

diag.panel = NULL, text.panel = textPanel,

label.pos = 0.5 + has.diag/3, line.main = 3,

cex.labels = NULL, font.labels = 1,

row1attop = TRUE, gap = 1, log = "")

Arguments

给定数据框的列来作图

formula

指定需要作图的变量 ~ x + y + z. Each term will give a separate variable in the pairs plot, so terms should be numeric vectors. (A response will be interpreted as another variable, but not treated specially, so it is confusing to use one.)

data

a data.frame (or list) from which the variables in formula should be taken.

subset

对某个列按要求提取数据

na.action

a function which indicates what should happen when the data contain NAs. The default is to pass missing values on to the panel functions, but na.action = na.omit will cause cases with missing values in any of the variables to be omitted entirely.

labels

变量的名字

main

主标题为

panel

function(x, y, ...) which is used to plot the contents of each panel of the display.

...

arguments to be passed to or from methods.

Also, graphical parameters can be given as can arguments to plot such as main. par("oma") will be set appropriately unless specified.

lower.panel, upper.panel

separate panel functions (or NULL) to be used below and above the diagonal respectively.

upper.panel=NULL 只生成下半三角的数据

diag.panel

optional function(x, ...) to be applied on the diagonals.

text.panel

optional function(x, y, labels, cex, font, ...) to be applied on the diagonals.

label.pos

y position of labels in the text panel.

line.main

if main is specified, line.main gives the line argument to mtext() which draws the title. You may want to specify oma when changing line.main.

cex.labels, font.labels

graphics parameters for the text panel.

row1attop

logical. Should the layout be matrix-like with row 1 at the top, or graph-like with row 1 at the bottom?

gap

distance between subplots, in margin lines.

log

a character string indicating if logarithmic axes are to be used: see plot.default. log = "xy" specifies logarithmic axes for all variables.

Details

The ijth scatterplot contains x[,i] plotted against x[,j]. The scatterplot can be customised by setting panel functions to appear as something completely different. The off-diagonal panel functions are passed the appropriate columns of x as x and y: the diagonal panel function (if any) is passed a single column, and the text.panel function is passed a single (x, y) location and the column name. Setting some of these panel functions to NULL is equivalent to not drawing anything there.

The graphical parameters pch and col can be used to specify a vector of plotting symbols and colors to be used in the plots.

The graphical parameter oma will be set by pairs.default unless supplied as an argument.

A panel function should not attempt to start a new plot, but just plot within a given coordinate system: thus plot and boxplot are not panel functions.

By default, missing values are passed to the panel functions and will often be ignored within a panel. However, for the formula method and na.action = na.omit, all cases which contain a missing values for any of the variables are omitted completely (including when the scales are selected).

例子：

pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species",

pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])

## formula method

pairs(~ Fertility + Education + Catholic, data = swiss,

subset = Education < 20, main = "Swiss data, Education < 20")

提取Education < 20的数

pairs(USJudgeRatings, text.panel = NULL, upper.panel = NULL)

上三角不显示

## put histograms on the diagonal

panel.hist <- function(x, ...)

{

usr <- par("usr"); on.exit(par(usr))

par(usr = c(usr[1:2], 0, 1.5) )

h <- hist(x, plot = FALSE)

breaks <- h$breaks; nB <- length(breaks)

y <- h$counts; y <- y/max(y)

rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)

}

pairs(USJudgeRatings[1:5], panel = panel.smooth,

cex = 1.5, pch = 24, bg = "light blue",

diag.panel = panel.hist, cex.labels = 2, font.labels = 2)

## put (absolute) correlations on the upper panels,

## with size proportional to the correlations.

panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)

{

usr <- par("usr"); on.exit(par(usr))

par(usr = c(0, 1, 0, 1))

r <- abs(cor(x, y))

txt <- format(c(r, 0.123456789), digits = digits)[1]

txt <- paste0(prefix, txt)

if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)

text(0.5, 0.5, txt, cex = cex.cor * r)

}

pairs(USJudgeRatings, lower.panel = panel.smooth, upper.panel = panel.cor)

pairs(iris[-5], log = "xy") # plot all variables on log scale

pairs(iris, log = 1:4, # log the first four

main = "Lengths and Widths in [log]", line.main=1.5, oma=c(2,2,3,2))

pairs(trackFeaturesTable[1:2], main="Edgar Anderson's Iris Data", pch=21,panel = panel.smooth,

# diag.panel=panel.hist,bg = c("red", "green3", "yellow")[unclass(trackFeaturesTable$Species)])

library(car)
scatterplotMatrix(~ DSpeed8 + Espeed8 | Species, data=trackFeaturesTable)

}

Scatterplot Matrices

Description

Scatterplot matrices with univariate displays down the diagonal; spm is an abbreviation for scatterplot.matrix. This function just sets up a call to pairs.

Usage

scatterplot.matrix(x, ...)

## S3 method for class 'formula':
scatterplot.matrix(formula, data=NULL, subset, ...)

## Default S3 method:
scatterplot.matrix(x, labels=colnames(x), 
    diagonal=c("density", "boxplot", "histogram", "oned", "qqplot", "none"), 
    adjust=1, nclass, plot.points=TRUE, smooth=TRUE, span=0.5, reg.line=lm, 
    transform=FALSE, ellipse=FALSE, levels=c(.5, .9), robust=FALSE,
    groups=FALSE, by.groups=FALSE, col=palette(), 
    pch=1:n.groups, lwd=1, lwd.smooth=lwd,
    cex=par("cex"), cex.axis=par("cex.axis"), cex.labels=NULL, 
    cex.main=par("cex.main"),
    legend.plot=length(levels(groups)) > 1, ...)

spm(x, ...)

Arguments

`x`	a data matrix, numeric data frame, or formula.
`formula`	a one-side ``model'' formula, of the form `~ x1 + x2 + ... + xk` or `~ x1 + x2 + ... + xk \| z` where `z` evaluates to a factor or other variable to divide the data into groups.
`data`	for `scatterplot.matrix.formula`, a data frame within which to evaluate the formula.
`subset`	expression defining a subset of observations.
`labels`	variable labels (for the diagonal of the plot).
`diagonal`	contents of the diagonal panels of the plot.
`adjust`	relative bandwidth for density estimate, passed to `density` function.
`nclass`	number of bins for histogram, passed to `hist` function.
`plot.points`	if `TRUE` the points are plotted in each off-diagonal panel.
`smooth`	if `TRUE` a lowess smooth is plotted in each off-diagonal panel.
`span`	span for lowess smoother.
`reg.line`	if not `FALSE` a line is plotted using the function given by this argument; e.g., using `rlm` in package `MASS` plots a robust-regression line.
`transform`	if `TRUE`, multivariate normalizing Box-Cox transformations are computed and plotted; if a vector of powers, one for each variable, these are applied as Box-Cox power transformations prior to plotting.
`ellipse`	if `TRUE` data-concentration ellipses are plotted in the off-diagonal panels.
`levels`	levels or levels at which concentration ellipses are plotted; the default is `c(.5, .9)`.
`robust`	if `TRUE` use the `cov.trob` function in the `MASS` package to calculate the center and covariance matrix for the data ellipse.
`groups`	a factor or other variable dividing the data into groups; groups are plotted with different colors and plotting characters.
`by.groups`	if `TRUE`, regression lines are fit by groups.
`pch`	plotting characters for points; default is the plotting characters in order (see `par`).
`col`	colors for points and lines; the default is the in the current color palette, starting at the second entry (see `palette` and `par`).
`lwd`	width of linear-regression lines (default `1`).
`lwd.smooth`	width for smooth regression lines (default is the same as `lwd`).
`cex, cex.axis, cex.labels, cex.main`	set sizes of various graphical elements; (see `par`).
`legend.plot`	if `TRUE` then a legend for the groups is plotted in the bottom-right cell.
`...`	arguments to pass down.

Scatterplot matrices in R

July 25, 2011

By Stephen Turner

name="f1e176cca21c0d" width="100px" height="21px" frameborder="0" allowtransparency="true" allowfullscreen="true" scrolling="no" title="fb:like Facebook Social Plugin" src="https://www.facebook.com/plugins/like.php?app_id=&channel=http%3A%2F%2Fstaticxx.facebook.com%2Fconnect%2Fxd_arbiter%2Fr%2Fbz-D0tzmBsw.js%3Fversion%3D42%23cb%3Df22920dfaf43b4%26domain%3Dwww.r-bloggers.com%26origin%3Dhttp%253A%252F%252Fwww.r-bloggers.com%252Ff3e6f80c89e2c9c%26relation%3Dparent.parent&container_width=0&height=21&href=http%3A%2F%2Fwww.r-bloggers.com%2Fscatterplot-matrices-in-r%2F&layout=button_count&locale=en_US&sdk=joey&send=true&show_faces=false&width=100" style="position: absolute; border-style: none; border-width: initial; visibility: visible; width: 111px; height: 20px;">

id="twitter-widget-0" scrolling="no" frameborder="0" allowtransparency="true" class="twitter-share-button twitter-share-button-rendered twitter-tweet-button" title="Twitter Tweet Button" src="http://platform.twitter.com/widgets/tweet_button.a9003d9964444592507bbb36b98c709b.en.html#dnt=false&id=twitter-widget-0&lang=en&original_referer=http%3A%2F%2Fwww.r-bloggers.com%2Fscatterplot-matrices-in-r%2F&size=m&text=Scatterplot%20matrices%20in%20R&time=1469119058613&type=share&url=http%3A%2F%2Fwww.r-bloggers.com%2Fscatterplot-matrices-in-r%2F&via=rbloggers" data-url="http://www.r-bloggers.com/scatterplot-matrices-in-r/" style="position: static; visibility: visible; width: 63px; height: 20px;">

(This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers)

I just discovered a handy function in R to produce a scatterplot matrix of selected variables in a dataset. The base graphics function is pairs(). Producing these plots can be helpful in exploring your data, especially using the second method below.

Try it out on the built in iris dataset. (data set gives the measurements in cm of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, andvirginica).

# Load the iris dataset.
data(iris)
 
# Plot #1: Basic scatterplot matrix of the four measurements
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris)

Looking at the pairs help page I found that there’s another built-in function, panel.smooth(), that can be used to plot a loess curve for each plot in a scatterplot matrix. Pass this function to the lower.panel argument of the pairs function. The panel.cor() function below can compute the absolute correlation between pairs of variables, and display these in the upper panels, with the font size proportional to the absolute value of the correlation

# panel.smooth function is built in.
# panel.cor puts correlation in upper panels, size proportional to correlation
panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- abs(cor(x, y))
    txt <- format(c(r, 0.123456789), digits=digits)[1]
    txt <- paste(prefix, txt, sep="")
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = cex.cor * r)
}
 
# Plot #2: same as above, but add loess smoother in lower and correlation in upper
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris,
      lower.panel=panel.smooth, upper.panel=panel.cor, 
      pch=20, main="Iris Scatterplot Matrix")

Finally, you can produce a similar plot using ggplot2, with the diagonal showing the kernel density.

# Plot #3: similar plot using ggplot2
# install.packages("ggplot2") ## uncomment to install ggplot2
library(ggplot2)
plotmatrix(with(iris, data.frame(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)))

## S3 method for class 'formula'

pairs(formula, data = NULL, ..., subset,

na.action = stats::na.pass)

## Default S3 method:

pairs(x, labels, panel = points, ...,

lower.panel = panel, upper.panel = panel,

diag.panel = NULL, text.panel = textPanel,

label.pos = 0.5 + has.diag/3, line.main = 3,

cex.labels = NULL, font.labels = 1,

row1attop = TRUE, gap = 1, log = "")

Arguments

给定数据框的列来作图

formula

data

a data.frame (or list) from which the variables in formula should be taken.

subset

对某个列按要求提取数据

na.action

labels

变量的名字

main

主标题为

panel

function(x, y, ...) which is used to plot the contents of each panel of the display.

...

arguments to be passed to or from methods.

Also, graphical parameters can be given as can arguments to plot such as main. par("oma") will be set appropriately unless specified.

lower.panel, upper.panel

separate panel functions (or NULL) to be used below and above the diagonal respectively.

upper.panel=NULL 只生成下半三角的数据

diag.panel

optional function(x, ...) to be applied on the diagonals.

text.panel

optional function(x, y, labels, cex, font, ...) to be applied on the diagonals.

label.pos

y position of labels in the text panel.

line.main

if main is specified, line.main gives the line argument to mtext() which draws the title. You may want to specify oma when changing line.main.

cex.labels, font.labels

graphics parameters for the text panel.

row1attop

logical. Should the layout be matrix-like with row 1 at the top, or graph-like with row 1 at the bottom?

gap

distance between subplots, in margin lines.

log

a character string indicating if logarithmic axes are to be used: see plot.default. log = "xy" specifies logarithmic axes for all variables.

Details

The graphical parameters pch and col can be used to specify a vector of plotting symbols and colors to be used in the plots.

The graphical parameter oma will be set by pairs.default unless supplied as an argument.

A panel function should not attempt to start a new plot, but just plot within a given coordinate system: thus plot and boxplot are not panel functions.

例子：

pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species",

pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])

## formula method

pairs(~ Fertility + Education + Catholic, data = swiss,

subset = Education < 20, main = "Swiss data, Education < 20")

提取Education < 20的数

pairs(USJudgeRatings, text.panel = NULL, upper.panel = NULL)

上三角不显示

## put histograms on the diagonal

panel.hist <- function(x, ...)

{

usr <- par("usr"); on.exit(par(usr))

par(usr = c(usr[1:2], 0, 1.5) )

h <- hist(x, plot = FALSE)

breaks <- h$breaks; nB <- length(breaks)

y <- h$counts; y <- y/max(y)

rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)

}

pairs(USJudgeRatings[1:5], panel = panel.smooth,

cex = 1.5, pch = 24, bg = "light blue",

diag.panel = panel.hist, cex.labels = 2, font.labels = 2)

## put (absolute) correlations on the upper panels,

## with size proportional to the correlations.

panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)

{

usr <- par("usr"); on.exit(par(usr))

par(usr = c(0, 1, 0, 1))

r <- abs(cor(x, y))

txt <- format(c(r, 0.123456789), digits = digits)[1]

txt <- paste0(prefix, txt)

if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)

text(0.5, 0.5, txt, cex = cex.cor * r)

}

pairs(USJudgeRatings, lower.panel = panel.smooth, upper.panel = panel.cor)

pairs(iris[-5], log = "xy") # plot all variables on log scale

pairs(iris, log = 1:4, # log the first four

main = "Lengths and Widths in [log]", line.main=1.5, oma=c(2,2,3,2))

pairs(trackFeaturesTable[1:2], main="Edgar Anderson's Iris Data", pch=21,panel = panel.smooth,

# diag.panel=panel.hist,bg = c("red", "green3", "yellow")[unclass(trackFeaturesTable$Species)])

library(car)
scatterplotMatrix(~ DSpeed8 + Espeed8 | Species, data=trackFeaturesTable)

}

Scatterplot Matrices

Description

Scatterplot matrices with univariate displays down the diagonal; spm is an abbreviation for scatterplot.matrix. This function just sets up a call to pairs.

Usage

scatterplot.matrix(x, ...)

## S3 method for class 'formula':
scatterplot.matrix(formula, data=NULL, subset, ...)

## Default S3 method:
scatterplot.matrix(x, labels=colnames(x), 
    diagonal=c("density", "boxplot", "histogram", "oned", "qqplot", "none"), 
    adjust=1, nclass, plot.points=TRUE, smooth=TRUE, span=0.5, reg.line=lm, 
    transform=FALSE, ellipse=FALSE, levels=c(.5, .9), robust=FALSE,
    groups=FALSE, by.groups=FALSE, col=palette(), 
    pch=1:n.groups, lwd=1, lwd.smooth=lwd,
    cex=par("cex"), cex.axis=par("cex.axis"), cex.labels=NULL, 
    cex.main=par("cex.main"),
    legend.plot=length(levels(groups)) > 1, ...)

spm(x, ...)

Arguments

`x`	a data matrix, numeric data frame, or formula.
`formula`	a one-side ``model'' formula, of the form `~ x1 + x2 + ... + xk` or `~ x1 + x2 + ... + xk \| z` where `z` evaluates to a factor or other variable to divide the data into groups.
`data`	for `scatterplot.matrix.formula`, a data frame within which to evaluate the formula.
`subset`	expression defining a subset of observations.
`labels`	variable labels (for the diagonal of the plot).
`diagonal`	contents of the diagonal panels of the plot.
`adjust`	relative bandwidth for density estimate, passed to `density` function.
`nclass`	number of bins for histogram, passed to `hist` function.
`plot.points`	if `TRUE` the points are plotted in each off-diagonal panel.
`smooth`	if `TRUE` a lowess smooth is plotted in each off-diagonal panel.
`span`	span for lowess smoother.
`reg.line`	if not `FALSE` a line is plotted using the function given by this argument; e.g., using `rlm` in package `MASS` plots a robust-regression line.
`transform`	if `TRUE`, multivariate normalizing Box-Cox transformations are computed and plotted; if a vector of powers, one for each variable, these are applied as Box-Cox power transformations prior to plotting.
`ellipse`	if `TRUE` data-concentration ellipses are plotted in the off-diagonal panels.
`levels`	levels or levels at which concentration ellipses are plotted; the default is `c(.5, .9)`.
`robust`	if `TRUE` use the `cov.trob` function in the `MASS` package to calculate the center and covariance matrix for the data ellipse.
`groups`	a factor or other variable dividing the data into groups; groups are plotted with different colors and plotting characters.
`by.groups`	if `TRUE`, regression lines are fit by groups.
`pch`	plotting characters for points; default is the plotting characters in order (see `par`).
`col`	colors for points and lines; the default is the in the current color palette, starting at the second entry (see `palette` and `par`).
`lwd`	width of linear-regression lines (default `1`).
`lwd.smooth`	width for smooth regression lines (default is the same as `lwd`).
`cex, cex.axis, cex.labels, cex.main`	set sizes of various graphical elements; (see `par`).
`legend.plot`	if `TRUE` then a legend for the groups is plotted in the bottom-right cell.
`...`	arguments to pass down.

Scatterplot matrices in R

July 25, 2011

By Stephen Turner

(This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers)

# Load the iris dataset.
data(iris)
 
# Plot #1: Basic scatterplot matrix of the four measurements
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris)

# panel.smooth function is built in.
# panel.cor puts correlation in upper panels, size proportional to correlation
panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- abs(cor(x, y))
    txt <- format(c(r, 0.123456789), digits=digits)[1]
    txt <- paste(prefix, txt, sep="")
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = cex.cor * r)
}
 
# Plot #2: same as above, but add loess smoother in lower and correlation in upper
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris,
      lower.panel=panel.smooth, upper.panel=panel.cor, 
      pch=20, main="Iris Scatterplot Matrix")

Finally, you can produce a similar plot using ggplot2, with the diagonal showing the kernel density.

# Plot #3: similar plot using ggplot2
# install.packages("ggplot2") ## uncomment to install ggplot2
library(ggplot2)
plotmatrix(with(iris, data.frame(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)))