【Spark八十九】Spark Streaming处理速度滞后于读取速度测试

最新推荐文章于 2025-10-29 14:16:05 发布

原创

最新推荐文章于 2025-10-29 14:16:05 发布 · 1.2k 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#大数据 #ui #java

本文探讨了Spark Streaming在处理速度滞后于数据读取速度时的测试情况。测试显示，每秒创建一个RDD，但处理速度为4秒，导致等待队列不匹配。UI数据显示，Spark Streaming共运行95秒，处理23个batch，每个batch平均耗时4秒。文章引用Tathagata Das的观点解释了等待队列计数问题，并强调了处理时间和调度延迟的重要性。如果处理时间超过批处理间隔，可能需要考虑减少处理时间。

1. 测试代码

package spark.examples.streaming

import org.apache.spark.SparkConf
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming._

object NetCatStreamingWordCountDelay {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("NetCatStreamingWordCountDelay")
    conf.setMaster("local[3]")
    //Receive data every second
    val ssc = new StreamingContext(conf, Seconds(1))
    val lines = ssc.socketTextStream("192.168.26.140", 9999)
    //Each processing should take about 4 seconds.
    lines.foreachRDD(rdd => {
      println("This is the output even if rdd is empty")
      Thread.sleep(4 * 1000)
    })
    ssc.start()
    ssc.awaitTermination()
  }
}

上面的测试代码：

1. 时间间隔设置为1秒，也就是说，每隔1秒钟，Spark Streaming将创建一个RDD

2. 处理的速度是4秒，也就是，处理速度滞后于数据的读取速度