## S3 method for class 'formula'
pairs(formula, data = NULL, ..., subset,
na.action = stats::na.pass)
## Default S3 method:
pairs(x, labels, panel = points, ...,
lower.panel = panel, upper.panel = panel,
diag.panel = NULL, text.panel = textPanel,
label.pos = 0.5 + has.diag/3, line.main = 3,
cex.labels = NULL, font.labels = 1,
row1attop = TRUE, gap = 1, log = "")
Arguments
x
给定数据框的列来作图
formula
指定需要作图的变量 ~ x + y + z. Each term will give a separate variable in the pairs plot, so terms should be numeric vectors. (A response will be interpreted as another variable, but not treated specially, so it is confusing to use one.)
data
a data.frame (or list) from which the variables in formula should be taken.
subset
对某个列按要求提取数据
na.action
a function which indicates what should happen when the data contain NAs. The default is to pass missing values on to the panel functions, but na.action = na.omit will cause cases with missing values in any of the variables to be omitted entirely.
labels
变量的名字
main
主标题为
panel
function(x, y, ...) which is used to plot the contents of each panel of the display.
...
arguments to be passed to or from methods.
Also, graphical parameters can be given as can arguments to plot such as main. par("oma") will be set appropriately unless specified.
lower.panel, upper.panel
separate panel functions (or NULL) to be used below and above the diagonal respectively.
upper.panel=NULL 只生成下半三角的数据
diag.panel
optional function(x, ...) to be applied on the diagonals.
text.panel
optional function(x, y, labels, cex, font, ...) to be applied on the diagonals.
label.pos
y position of labels in the text panel.
line.main
if main is specified, line.main gives the line argument to mtext() which draws the title. You may want to specify oma when changing line.main.
cex.labels, font.labels
graphics parameters for the text panel.
row1attop
logical. Should the layout be matrix-like with row 1 at the top, or graph-like with row 1 at the bottom?
gap
distance between subplots, in margin lines.
log
a character string indicating if logarithmic axes are to be used: see plot.default. log = "xy" specifies logarithmic axes for all variables.
Details
The ijth scatterplot contains x[,i] plotted against x[,j]. The scatterplot can be customised by setting panel functions to appear as something completely different. The off-diagonal panel functions are passed the appropriate columns of x as x and y: the diagonal panel function (if any) is passed a single column, and the text.panel function is passed a single (x, y) location and the column name. Setting some of these panel functions to NULL is equivalent to not drawing anything there.
The graphical parameters pch and col can be used to specify a vector of plotting symbols and colors to be used in the plots.
The graphical parameter oma will be set by pairs.default unless supplied as an argument.
A panel function should not attempt to start a new plot, but just plot within a given coordinate system: thus plot and boxplot are not panel functions.
By default, missing values are passed to the panel functions and will often be ignored within a panel. However, for the formula method and na.action = na.omit, all cases which contain a missing values for any of the variables are omitted completely (including when the scales are selected).
例子:
pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species",
pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])
## formula method
pairs(~ Fertility + Education + Catholic, data = swiss,
subset = Education < 20, main = "Swiss data, Education < 20")
提取Education < 20的数
pairs(USJudgeRatings, text.panel = NULL, upper.panel = NULL)
上三角不显示
## put histograms on the diagonal
panel.hist <- function(x, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
}
pairs(USJudgeRatings[1:5], panel = panel.smooth,
cex = 1.5, pch = 24, bg = "light blue",
diag.panel = panel.hist, cex.labels = 2, font.labels = 2)
## put (absolute) correlations on the upper panels,
## with size proportional to the correlations.
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs(USJudgeRatings, lower.panel = panel.smooth, upper.panel = panel.cor)
pairs(iris[-5], log = "xy") # plot all variables on log scale
pairs(iris, log = 1:4, # log the first four
main = "Lengths and Widths in [log]", line.main=1.5, oma=c(2,2,3,2))
pairs(trackFeaturesTable[1:2], main="Edgar Anderson's Iris Data", pch=21,panel = panel.smooth,
# diag.panel=panel.hist,bg = c("red", "green3", "yellow")[unclass(trackFeaturesTable$Species)])
library(car)
scatterplotMatrix(~ DSpeed8 + Espeed8 | Species, data=trackFeaturesTable)
}
Scatterplot Matrices
Description
Scatterplot matrices with univariate displays down the diagonal; spm
is an abbreviation for scatterplot.matrix
. This function just sets up a call to pairs
.
Usage
scatterplot.matrix(x, ...)
## S3 method for class 'formula':
scatterplot.matrix(formula, data=NULL, subset, ...)
## Default S3 method:
scatterplot.matrix(x, labels=colnames(x),
diagonal=c("density", "boxplot", "histogram", "oned", "qqplot", "none"),
adjust=1, nclass, plot.points=TRUE, smooth=TRUE, span=0.5, reg.line=lm,
transform=FALSE, ellipse=FALSE, levels=c(.5, .9), robust=FALSE,
groups=FALSE, by.groups=FALSE, col=palette(),
pch=1:n.groups, lwd=1, lwd.smooth=lwd,
cex=par("cex"), cex.axis=par("cex.axis"), cex.labels=NULL,
cex.main=par("cex.main"),
legend.plot=length(levels(groups)) > 1, ...)
spm(x, ...)
Arguments
x | a data matrix, numeric data frame, or formula. |
formula | a one-side ``model'' formula, of the form ~ x1 + x2 + ... + xk or ~ x1 + x2 + ... + xk | z where z evaluates to a factor or other variable to divide the data into groups. |
data | for scatterplot.matrix.formula , a data frame within which to evaluate the formula. |
subset | expression defining a subset of observations. |
labels | variable labels (for the diagonal of the plot). |
diagonal | contents of the diagonal panels of the plot. |
adjust | relative bandwidth for density estimate, passed to density function. |
nclass | number of bins for histogram, passed to hist function. |
plot.points | if TRUE the points are plotted in each off-diagonal panel. |
smooth | if TRUE a lowess smooth is plotted in each off-diagonal panel. |
span | span for lowess smoother. |
reg.line | if not FALSE a line is plotted using the function given by this argument; e.g., using rlm in package MASS plots a robust-regression line. |
transform | if TRUE , multivariate normalizing Box-Cox transformations are computed and plotted; if a vector of powers, one for each variable, these are applied as Box-Cox power transformations prior to plotting. |
ellipse | if TRUE data-concentration ellipses are plotted in the off-diagonal panels. |
levels | levels or levels at which concentration ellipses are plotted; the default is c(.5, .9) . |
robust | if TRUE use the cov.trob function in the MASS package to calculate the center and covariance matrix for the data ellipse. |
groups | a factor or other variable dividing the data into groups; groups are plotted with different colors and plotting characters. |
by.groups | if TRUE , regression lines are fit by groups. |
pch | plotting characters for points; default is the plotting characters in order (see par ). |
col | colors for points and lines; the default is the in the current color palette, starting at the second entry (see palette and par ). |
lwd | width of linear-regression lines (default 1 ). |
lwd.smooth | width for smooth regression lines (default is the same as lwd ). |
cex, cex.axis, cex.labels, cex.main | set sizes of various graphical elements; (see par ). |
legend.plot | if TRUE then a legend for the groups is plotted in the bottom-right cell. |
... | arguments to pass down. |
Scatterplot matrices in R
name="f1e176cca21c0d" width="100px" height="21px" frameborder="0" allowtransparency="true" allowfullscreen="true" scrolling="no" title="fb:like Facebook Social Plugin" src="https://www.facebook.com/plugins/like.php?app_id=&channel=http%3A%2F%2Fstaticxx.facebook.com%2Fconnect%2Fxd_arbiter%2Fr%2Fbz-D0tzmBsw.js%3Fversion%3D42%23cb%3Df22920dfaf43b4%26domain%3Dwww.r-bloggers.com%26origin%3Dhttp%253A%252F%252Fwww.r-bloggers.com%252Ff3e6f80c89e2c9c%26relation%3Dparent.parent&container_width=0&height=21&href=http%3A%2F%2Fwww.r-bloggers.com%2Fscatterplot-matrices-in-r%2F&layout=button_count&locale=en_US&sdk=joey&send=true&show_faces=false&width=100" style="position: absolute; border-style: none; border-width: initial; visibility: visible; width: 111px; height: 20px;">
I just discovered a handy function in R to produce a scatterplot matrix of selected variables in a dataset. The base graphics function is pairs(). Producing these plots can be helpful in exploring your data, especially using the second method below.
Try it out on the built in iris dataset. (data set gives the measurements in cm of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, andvirginica).
# Load the iris dataset.
data(iris)
# Plot #1: Basic scatterplot matrix of the four measurements
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris)
Looking at the pairs help page I found that there’s another built-in function, panel.smooth(), that can be used to plot a loess curve for each plot in a scatterplot matrix. Pass this function to the lower.panel argument of the pairs function. The panel.cor() function below can compute the absolute correlation between pairs of variables, and display these in the upper panels, with the font size proportional to the absolute value of the correlation
# panel.smooth function is built in.
# panel.cor puts correlation in upper panels, size proportional to correlation
panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits=digits)[1]
txt <- paste(prefix, txt, sep="")
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
# Plot #2: same as above, but add loess smoother in lower and correlation in upper
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris,
lower.panel=panel.smooth, upper.panel=panel.cor,
pch=20, main="Iris Scatterplot Matrix")
Finally, you can produce a similar plot using ggplot2, with the diagonal showing the kernel density.
# Plot #3: similar plot using ggplot2
# install.packages("ggplot2") ## uncomment to install ggplot2
library(ggplot2)
plotmatrix(with(iris, data.frame(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)))
## S3 method for class 'formula'
pairs(formula, data = NULL, ..., subset,
na.action = stats::na.pass)
## Default S3 method:
pairs(x, labels, panel = points, ...,
lower.panel = panel, upper.panel = panel,
diag.panel = NULL, text.panel = textPanel,
label.pos = 0.5 + has.diag/3, line.main = 3,
cex.labels = NULL, font.labels = 1,
row1attop = TRUE, gap = 1, log = "")
Arguments
x
给定数据框的列来作图
formula
指定需要作图的变量 ~ x + y + z. Each term will give a separate variable in the pairs plot, so terms should be numeric vectors. (A response will be interpreted as another variable, but not treated specially, so it is confusing to use one.)
data
a data.frame (or list) from which the variables in formula should be taken.
subset
对某个列按要求提取数据
na.action
a function which indicates what should happen when the data contain NAs. The default is to pass missing values on to the panel functions, but na.action = na.omit will cause cases with missing values in any of the variables to be omitted entirely.
labels
变量的名字
main
主标题为
panel
function(x, y, ...) which is used to plot the contents of each panel of the display.
...
arguments to be passed to or from methods.
Also, graphical parameters can be given as can arguments to plot such as main. par("oma") will be set appropriately unless specified.
lower.panel, upper.panel
separate panel functions (or NULL) to be used below and above the diagonal respectively.
upper.panel=NULL 只生成下半三角的数据
diag.panel
optional function(x, ...) to be applied on the diagonals.
text.panel
optional function(x, y, labels, cex, font, ...) to be applied on the diagonals.
label.pos
y position of labels in the text panel.
line.main
if main is specified, line.main gives the line argument to mtext() which draws the title. You may want to specify oma when changing line.main.
cex.labels, font.labels
graphics parameters for the text panel.
row1attop
logical. Should the layout be matrix-like with row 1 at the top, or graph-like with row 1 at the bottom?
gap
distance between subplots, in margin lines.
log
a character string indicating if logarithmic axes are to be used: see plot.default. log = "xy" specifies logarithmic axes for all variables.
Details
The ijth scatterplot contains x[,i] plotted against x[,j]. The scatterplot can be customised by setting panel functions to appear as something completely different. The off-diagonal panel functions are passed the appropriate columns of x as x and y: the diagonal panel function (if any) is passed a single column, and the text.panel function is passed a single (x, y) location and the column name. Setting some of these panel functions to NULL is equivalent to not drawing anything there.
The graphical parameters pch and col can be used to specify a vector of plotting symbols and colors to be used in the plots.
The graphical parameter oma will be set by pairs.default unless supplied as an argument.
A panel function should not attempt to start a new plot, but just plot within a given coordinate system: thus plot and boxplot are not panel functions.
By default, missing values are passed to the panel functions and will often be ignored within a panel. However, for the formula method and na.action = na.omit, all cases which contain a missing values for any of the variables are omitted completely (including when the scales are selected).
例子:
pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species",
pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])
## formula method
pairs(~ Fertility + Education + Catholic, data = swiss,
subset = Education < 20, main = "Swiss data, Education < 20")
提取Education < 20的数
pairs(USJudgeRatings, text.panel = NULL, upper.panel = NULL)
上三角不显示
## put histograms on the diagonal
panel.hist <- function(x, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
}
pairs(USJudgeRatings[1:5], panel = panel.smooth,
cex = 1.5, pch = 24, bg = "light blue",
diag.panel = panel.hist, cex.labels = 2, font.labels = 2)
## put (absolute) correlations on the upper panels,
## with size proportional to the correlations.
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs(USJudgeRatings, lower.panel = panel.smooth, upper.panel = panel.cor)
pairs(iris[-5], log = "xy") # plot all variables on log scale
pairs(iris, log = 1:4, # log the first four
main = "Lengths and Widths in [log]", line.main=1.5, oma=c(2,2,3,2))
pairs(trackFeaturesTable[1:2], main="Edgar Anderson's Iris Data", pch=21,panel = panel.smooth,
# diag.panel=panel.hist,bg = c("red", "green3", "yellow")[unclass(trackFeaturesTable$Species)])
library(car)
scatterplotMatrix(~ DSpeed8 + Espeed8 | Species, data=trackFeaturesTable)
}
Scatterplot Matrices
Description
Scatterplot matrices with univariate displays down the diagonal; spm
is an abbreviation for scatterplot.matrix
. This function just sets up a call to pairs
.
Usage
scatterplot.matrix(x, ...)
## S3 method for class 'formula':
scatterplot.matrix(formula, data=NULL, subset, ...)
## Default S3 method:
scatterplot.matrix(x, labels=colnames(x),
diagonal=c("density", "boxplot", "histogram", "oned", "qqplot", "none"),
adjust=1, nclass, plot.points=TRUE, smooth=TRUE, span=0.5, reg.line=lm,
transform=FALSE, ellipse=FALSE, levels=c(.5, .9), robust=FALSE,
groups=FALSE, by.groups=FALSE, col=palette(),
pch=1:n.groups, lwd=1, lwd.smooth=lwd,
cex=par("cex"), cex.axis=par("cex.axis"), cex.labels=NULL,
cex.main=par("cex.main"),
legend.plot=length(levels(groups)) > 1, ...)
spm(x, ...)
Arguments
x | a data matrix, numeric data frame, or formula. |
formula | a one-side ``model'' formula, of the form ~ x1 + x2 + ... + xk or ~ x1 + x2 + ... + xk | z where z evaluates to a factor or other variable to divide the data into groups. |
data | for scatterplot.matrix.formula , a data frame within which to evaluate the formula. |
subset | expression defining a subset of observations. |
labels | variable labels (for the diagonal of the plot). |
diagonal | contents of the diagonal panels of the plot. |
adjust | relative bandwidth for density estimate, passed to density function. |
nclass | number of bins for histogram, passed to hist function. |
plot.points | if TRUE the points are plotted in each off-diagonal panel. |
smooth | if TRUE a lowess smooth is plotted in each off-diagonal panel. |
span | span for lowess smoother. |
reg.line | if not FALSE a line is plotted using the function given by this argument; e.g., using rlm in package MASS plots a robust-regression line. |
transform | if TRUE , multivariate normalizing Box-Cox transformations are computed and plotted; if a vector of powers, one for each variable, these are applied as Box-Cox power transformations prior to plotting. |
ellipse | if TRUE data-concentration ellipses are plotted in the off-diagonal panels. |
levels | levels or levels at which concentration ellipses are plotted; the default is c(.5, .9) . |
robust | if TRUE use the cov.trob function in the MASS package to calculate the center and covariance matrix for the data ellipse. |
groups | a factor or other variable dividing the data into groups; groups are plotted with different colors and plotting characters. |
by.groups | if TRUE , regression lines are fit by groups. |
pch | plotting characters for points; default is the plotting characters in order (see par ). |
col | colors for points and lines; the default is the in the current color palette, starting at the second entry (see palette and par ). |
lwd | width of linear-regression lines (default 1 ). |
lwd.smooth | width for smooth regression lines (default is the same as lwd ). |
cex, cex.axis, cex.labels, cex.main | set sizes of various graphical elements; (see par ). |
legend.plot | if TRUE then a legend for the groups is plotted in the bottom-right cell. |
... | arguments to pass down. |
Scatterplot matrices in R
name="f1e176cca21c0d" width="100px" height="21px" frameborder="0" allowtransparency="true" allowfullscreen="true" scrolling="no" title="fb:like Facebook Social Plugin" src="https://www.facebook.com/plugins/like.php?app_id=&channel=http%3A%2F%2Fstaticxx.facebook.com%2Fconnect%2Fxd_arbiter%2Fr%2Fbz-D0tzmBsw.js%3Fversion%3D42%23cb%3Df22920dfaf43b4%26domain%3Dwww.r-bloggers.com%26origin%3Dhttp%253A%252F%252Fwww.r-bloggers.com%252Ff3e6f80c89e2c9c%26relation%3Dparent.parent&container_width=0&height=21&href=http%3A%2F%2Fwww.r-bloggers.com%2Fscatterplot-matrices-in-r%2F&layout=button_count&locale=en_US&sdk=joey&send=true&show_faces=false&width=100" style="position: absolute; border-style: none; border-width: initial; visibility: visible; width: 111px; height: 20px;">
I just discovered a handy function in R to produce a scatterplot matrix of selected variables in a dataset. The base graphics function is pairs(). Producing these plots can be helpful in exploring your data, especially using the second method below.
Try it out on the built in iris dataset. (data set gives the measurements in cm of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, andvirginica).
# Load the iris dataset.
data(iris)
# Plot #1: Basic scatterplot matrix of the four measurements
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris)
Looking at the pairs help page I found that there’s another built-in function, panel.smooth(), that can be used to plot a loess curve for each plot in a scatterplot matrix. Pass this function to the lower.panel argument of the pairs function. The panel.cor() function below can compute the absolute correlation between pairs of variables, and display these in the upper panels, with the font size proportional to the absolute value of the correlation
# panel.smooth function is built in.
# panel.cor puts correlation in upper panels, size proportional to correlation
panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits=digits)[1]
txt <- paste(prefix, txt, sep="")
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
# Plot #2: same as above, but add loess smoother in lower and correlation in upper
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris,
lower.panel=panel.smooth, upper.panel=panel.cor,
pch=20, main="Iris Scatterplot Matrix")
Finally, you can produce a similar plot using ggplot2, with the diagonal showing the kernel density.
# Plot #3: similar plot using ggplot2
# install.packages("ggplot2") ## uncomment to install ggplot2
library(ggplot2)
plotmatrix(with(iris, data.frame(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)))