Scala for表达式解析-优快云博客

2019独角兽企业重金招聘Python工程师标准>>>

在Scala中，通常有以下几种使用方式：

for (p <- e) e'  
for (p <- e if g) e'  
for (p <- e; p' <- e' ...) e''

以及相应的

for (p <- e) yield e'  
for (p <- e if g) yield e'  
for (p <- e; p' <- e' ...) yield e''

其中p,p'为Scala中的Pattern；e,e',e''为表达式；g为Boolean表达式。

根据《The Scala Language Specification Version 2.7》，上面的for表达式将在编译阶段展开为下面的形式（没有考虑p为比较复杂的Pattern时的情形）：

for (p <- e) e'                 => e.foreach { case p => e' } 
for (p <- e if g) e'            => for (p <- e.filter{ (x1,...,xn) => g }) e' => ..  
for (p <- e; p' <- e' ...) e''  => e.foreach{ case p => for (p' <- e' ...) e'' }

以及相应的

for (p <- e) yield e'                => e.map { case p => e' }  
for (p <- e if g) yield e'           => for (p <- e.filter{ (x1,...,xn) => g }) yield e' 
for (p <- e; p' <- e' ...) yield e'' => e.flatmap { case p => for (p' <- e' ...) yield e'' }

注意的是，这个转换发生在类型检查之前。也就是说，对map,filter,flatMap以及foreach这四个方法的方法签名没有任何其它限制，只需要满足展开后for语句的类型检查（个人觉得有点类似于C语言的宏展开）。了解了Scala编译器对for表达式的解析规则后，我们可以自定义for表达式的含义。

这里要注意的是在Scala 2.8以前的版本中， for (p <- e if g) 和for (p <- e) { if (g)．．．｝是有区别的。前者对e做了两次遍历，而后者只做一次。虽然一般情况下会得到相同的结果，但在集合的规模较大时，会显现出明显的性能问题。例如求1~1000000中所有偶数之和：

def innerif(m: Int)={
    val set = 1 until m
    var sum = 0
    for(num <- set; if (num%2 == 0)) sum += num
}

def outerif(m: Int)={
    val set = 1 until m
    var sum = 0
    for (num <- set) { if (num % 2 == 0 ) sum += num }
}

def testMethod(n: Int)(m: Int)(body: (Int) =>Unit){
    var avgMilliSec = 0.0
    for(i <- 1 to n){
        var start = System.currentTimeMillis();
        body(m)
        var end = System.currentTimeMillis();
        val time = end - start; 
        avgMilliSec = 1.0*  ((i-1)* avgMilliSec +  time) / i;
    }
    println("avg time: "+avgMilliSec);
}

testMethod(10)(1000000)(innerif)
testMethod(10)(1000000)(outerif)

在scala 2.7.7 final REPL上的测试结果显示，innerif平均用时约142.2ms, outerif平均用时约15.7ms, 此外，当g与e'中同时包含一个变量v，并且在g中对变量v进行改动时，实际运行结果可能和我们所预想的不一致。看下面的例子：

def compress1[T](l: List[T]): List[T]={
   var r = List(l.first);
   for (x <- l; if (x != r.last)) r = r ::: List(x);
   r
}

def compress2[T](l: List[T]): List[T]={
   var r = List(l.first);
   for (x <- l) if (x != r.last) r = r ::: List(x);
   r
}

val cl = List('a, 'a, 'a, 'a, 'b, 'c, 'c, 'a, 'a, 'd, 'e, 'e, 'e, 'e)
compress1(cl)
compress2(cl)

这个例子的功能是将一个List中相邻重复的元素去掉，比如List('a, 'a, 'a, 'a, 'b, 'c, 'c, 'a, 'a, 'd, 'e, 'e, 'e, 'e)，去掉相邻重复的元素后应为List('a,'b,'c,'a,'d','e), compress1过滤时使用的是同一个r的实例，也就是全部过滤好后再往后计算，而compress2是边过滤边计算，每次过滤时r 都可能不同。

Scala 2.7 对for ( p <- e ; if g)的这种解析显然违背了C或Java程序员的习惯，为此Scala 2.8做出了调整——让for(p <-e ; if g)在效果上等同于for (p <- e ) { if (g) ... } —— 迎合C或Java程序员的使用习惯。为什么说是效果上等同呢？因为scala 2.8将其解析为 for ( p <- e.withFilter(...) ) ，这个withFilter函数和filter函数一样，也是定义在scala.collection.TraversableLike中，也是接收一个 (A) => Boolean类型的函数对象作为参数；不同是它并不创建一个新的符合过滤条件的元素所组成的集合，而是返回一个 WithFilter类。 WithFilter类相当于原集合的一个代理类，其中的map，flatmap, foreach函数实现会将过滤函数应用于原集合的每个元素。这样就实现了for (p <- e ) { if (g) ... }的效果.有兴趣可以看看Martin Odersky在scala-lang上针对这个问题所发表的一个帖子。

scala.collection.TraversableLike是Scala集合框架的一个基础接口，里面定义了很多针对集合操作的基本方法，比如flatMap, map, foreach, filter, withFilter等。WithFilter定义在TraversableLike中，它继承自scala.generic.FilterMonadic. FilterMonadic中声明了四个函数foreach,map, flatMap, withFilter。

附：WithFilter类的源码

/** A class supporting filtered operations. Instances of this class are
	   *  returned by method `withFilter`.
	   */
	  class WithFilter(p: A => Boolean) extends FilterMonadic[A, Repr] {
	
	    /** Builds a new collection by applying a function to all elements of the
	     *  outer $coll containing this `WithFilter` instance that satisfy predicate `p`.
	     *
	     *  @param f      the function to apply to each element.
	     *  @tparam B     the element type of the returned collection.
	     *  @tparam That  $thatinfo
	     *  @param bf     $bfinfo
	     *  @return       a new collection of type `That` resulting from applying
	     *                the given function `f` to each element of the outer $coll
	     *                that satisfies predicate `p` and collecting the results.
	     *
	     *  @usecase def map[B](f: A => B): $Coll[B]
	     * 
	     *  @return       a new $coll resulting from applying the given function
	     *                `f` to each element of the outer $coll that satisfies
	     *                predicate `p` and collecting the results.
	     */
	    def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
	      val b = bf(repr)
	      for (x <- self) 
	        if (p(x)) b += f(x)
	      b.result
	    }
	
	    /** Builds a new collection by applying a function to all elements of the
	     *  outer $coll containing this `WithFilter` instance that satisfy
	     *  predicate `p` and concatenating the results.
	     *
	     *  @param f      the function to apply to each element.
	     *  @tparam B     the element type of the returned collection.
	     *  @tparam That  $thatinfo
	     *  @param bf     $bfinfo
	     *  @return       a new collection of type `That` resulting from applying
	     *                the given collection-valued function `f` to each element
	     *                of the outer $coll that satisfies predicate `p` and
	     *                concatenating the results.
	     *
	     *  @usecase def flatMap[B](f: A => TraversableOnce[B]): $Coll[B]
	     *
	     *  @return       a new $coll resulting from applying the given collection-valued function
	     *                `f` to each element of the outer $coll that satisfies predicate `p` and concatenating the results.
	     */
	    def flatMap[B, That](f: A => GenTraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That = {
	      val b = bf(repr)
	      for (x <- self) 
	        if (p(x)) b ++= f(x).seq
	      b.result
	    }
	
	    /** Applies a function `f` to all elements of the outer $coll containing
	     *  this `WithFilter` instance that satisfy predicate `p`.
	     *
	     *  @param  f   the function that is applied for its side-effect to every element.
	     *              The result of function `f` is discarded.
	     *             
	     *  @tparam  U  the type parameter describing the result of function `f`.
	     *              This result will always be ignored. Typically `U` is `Unit`,
	     *              but this is not necessary.
	     *
	     *  @usecase def foreach(f: A => Unit): Unit
	     */   
	    def foreach[U](f: A => U): Unit = 
	      for (x <- self) 
	        if (p(x)) f(x)
	
	    /** Further refines the filter for this $coll.
	     *
	     *  @param q   the predicate used to test elements.
	     *  @return    an object of class `WithFilter`, which supports
	     *             `map`, `flatMap`, `foreach`, and `withFilter` operations.
	     *             All these operations apply to those elements of this $coll which
	     *             satisfy the predicate `q` in addition to the predicate `p`.
	     */
	    def withFilter(q: A => Boolean): WithFilter = 
	      new WithFilter(x => p(x) && q(x))
}