好吧,瞄函数构造的话还是用str(fun)方便些,用fun,太杂,不爽
3.1各种-apply(即简化循环)
各种,一个辅助的split,较之前只会subset要强一些哈.最后一个mapply是一个multivariate多元应用
lapply用c语言编的,所以呢。。。。
用c编的会快点,但有点用formals(fun)看形参,body(fun)看方法体,page(fun)将具体的以rpage的形式打出来,fun,这些用来查看的函数可能就没用了,那后面的...则是fun中的参数,要记住哦
lapply的返回值会无视输入值的类型,变成一个列表
如x<-list(a=1:5,b=rnorm(10))
> x
$a
[1] 1 2 3 4 5
$b
[1] -0.8774361 -2.0198487 -1.3250659 -1.0528759
[5] 1.9882519 0.2930956 -1.3617602 -1.0195050
[9] 0.8024887 0.1371713
> lapply(x,mean)
uniform distribution均匀分布,如runif(1)产生1位。。。。
如> x <-1:4
[[1]]
[1] 8.
> lapply(x,runif)
[[1]]
[1] 0.8964254
[[2]]
[1] 0.94694692 0.02346353
[[3]]
[1] 0.4468687 0.6815853 0.5211994
[4]]
[1] 0.01230543 0.75179793 0.26955648 0.81547236
而另一个lapply(x,runif,min=0,max=10)
[[1]]
[1] 8.552639
[[2]]
[1] 9.914502 9.093752
[[3]]
[1] 4.622779 4.540884 5.556758
[[4]]
[1] 3.066473 3.867984 3.878329 5.600041
好吧,lapply的另一个好处就是匿名函数的运用
x <-list(a=matrix(1:4,2,2),b=matrix(1:6,3,2))
> x
$a
[,1] [,2]
[1,] 1 3
[2,] 2 4
$b
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
lapply(x,function(elt) elt[,1])
$a
[1] 1 2
$b
[1] 1 2 3
这里相当于现做了一个函数,形参为elt。即function(elt){elt[,1]},这种函数只有在lapply出现的时候才会出现,用完了就自动扔了,所以才叫匿名函数,用得很heavily
另一种有意思的实验
sapply(x,class)
a b
"matrix" "matrix"
> lapply(x,class)
$a
[1] "matrix"
$b
[1] "matrix"
> class(lapply(x,class))
[1] "list"
> class(sapply(x,class))
[1] "character"
这例子是把第三个维去掉,这相当于10个2*2的矩阵
class(tapply(x,f,mean)) tapply默认simplify=True ,加了simplify=False之后,就成了list
[1] "array"
然后就是split,split是一大神器,与apply,lapply,sapply合用,比较厉害的
如上例中
split(x,f)
$`1`
[1] 0.79150069 -0.08758313 0.11875259 1.06214913
[5] 0.18936631 1.52672816 0.38512782 0.09617051
[9] 0.49608406 -0.14524562
$`2`
[1] 0.1372091 0.5180116 0.2609488 0.3998524 0.8659073
[6] 0.9294329 0.8947693 0.6235423 0.2145166 0.3253637
$`3`
[1] 1.2545476 0.2108688 1.3833433 1.2230497
[5] 0.2912928 0.2959154 0.2218635 -0.3192566
[9] 2.6216614 -0.3427608
常用的组合为lapply与split
如lapply(split(x,f),mean)
$`1`
[1] 0.4433051
$`2`
[1] 0.5169554
$`3`
[1] 0.6840525 在这种情况下呢用tapply显得更爽,毕竟,它用的代码少啊
当然,split可以处理更复杂的对象类型啊
处理数据框就很不错啊
s<-split(airquality,airquality$Month)
lapply(s,function(x) colMeans(x[,c("Ozone","Solar.R","Wind")]))
$`5`
Ozone Solar.R Wind
NA NA 11.62258
$`6`
Ozone Solar.R Wind
NA 190.16667 10.26667
$`7`
Ozone Solar.R Wind
NA 216.483871 8.941935
$`8`
Ozone Solar.R Wind
NA NA 8.793548
$`9`
Ozone Solar.R Wind
NA 167.4333 10.1800
好吧,来个花样split
用来搞列联表,爽翻了
注意,别sb似的觉得1.2就是小数1.2,其实可以看作是文章中的第一章第二小节的意思,是两个因子多个水平,现在跟统计一结合,明白了些些啊
当然会存在一些空的水平
如果用rep(1:4,4:1)的话会返回一个向量,而用as.list(rep(1:4,4:1)的话,返回的列表则是一个长度一个位置,故不如mapply啊,当然上面
我用了split(rep(1:4,4:1),rep(1:4,4:1))也能得出一样的结果
据老师说它用上图这种方式得不到想要的结果,而用下图的则可以的
注意这两者之间的差别
用这种方法可以向量化函数,当函数的参数无法向量化的时候 ,上面那个rep参数就可以向量化 instant vectorization即时向量化
也就是说上面这个跟这个是一样的,即list(noise(1,1,2),noise(2,2,2),noise(3,3,2),noise(4,4,2),noise(5,5,2))也等于lapply(1:,noise,mean=1,sd=2)
printmessage2 <- function(x) { if(is.na(x)) print("x is a missing value!") else if(x > 0) print("x is greater than zero") else print("x is less than or equal to zero") invisible(x) }invisible的作用Return a (temporarily) invisible copy of an object.
This function can be useful when it is desired to have functions return values which can be assigned, but which do not print when they are not assigned.
如:# These functions both return their argument
f1 <- function(x) x
f2 <- function(x) invisible(x)
f1(1) # prints
f2(1) # does not
traceback
debug
recover
Debugging
Summary
- There are three main indications of a problem/condition:
message
,warning
,error
- only an
error
is fatal
- only an
- When analyzing a function with a problem, make sure you can reproduce the problem, clearly state your expectations and how the output differs from your expectation
- Interactive debugging tools
traceback
,debug
,browser
,trace
, andrecover
can be used to find problematic code in functions - Debugging tools are not a substitute for thinking!