set.seed(1410)
dsmall <- diamonds[sample(nrow(diamonds), 100), ]
loess平滑器
span参数0到1, 很不平滑到很平滑
qplot(carat, price , data = dsmall, geom =c('point','smooth'), span = 0.2)
loess对于大数据并不十分适用(O(n)), 因此, 当n超过1000时将默认采用另一种平滑算法.
mgcv包gam拟合广义可加模型
library(mgcv)
qplot(carat, price, data = dsmall, geom = c("point", "smooth"), method = "gam",
formula = y ~ s(x))
qplot(carat, price, data = dsmall, geom = c("point", "smooth"), method = "gam",
formula = y ~ s(x, bs = "cs"))
公式formula = y ~ s(x, bs = “cs”))是数据量超过1000的默认选项.
lm线性拟合
splines包自然样条
在运用线性模型作为平滑器时 formula 参数的作用。左图是 formula =
## y ~ x 的默认值, 右图是 formula = y ~ ns(x, 5)。
library(splines)
qplot(carat, price, data = dsmall, geom = c("point", "smooth"), method = "lm")
qplot(carat, price, data = dsmall, geom = c("point", "smooth"), method = "lm",
formula = y ~ ns(x, 5))
MASS包的rlm
对异常值不敏感
jitter扰动点图
qplot(color, price/carat, data = diamonds, geom = "jitter", alpha = I(1/2))
条形图加权
qplot(color, data = diamonds, geom = "bar", weight = carat) + scale_y_continuous("carat")