如何在Spark中记录我们自己想要输出的日志
我们可以这样:
object app {
def main(args: Array[String]) {
val log = LogManager.getRootLogger
log.setLevel(Level.WARN)
val conf = new SparkConf().setAppName("demo-app")
val sc = new SparkContext(conf)
log.warn("Hello demo")
val data = sc.parallelize(1 to 100000)
log.warn("I am done")
}
}
这些日志将会打印在shell中并且记录在日志中
但是,如果像下面这样:
val log = LogManager.getRootLogger
val data = sc.parallelize(1 to 100000)
data.map { value =>
log.info(value)
value.toString
}
日志将不会输出,因为这个object没有序列化
所以可以改为下面这样:
import org.apache.log4j.{Level, LogManager, PropertyConfigurator}
import org.apache.spark._
import org.apache.spark.rdd.RDD
class Mapper(n: Int) extends Serializable{
@transient lazy val log = org.apache.log4j.LogManager.getLogger("myLogger")
def doSomeMappingOnDataSetAndLogIt(rdd: RDD[Int]): RDD[String] =
rdd.map{ i =>
log.warn("mapping: " + i)
(i + n).toString
}
}
object Mapper {
def apply(n: Int): Mapper = new Mapper(n)
}
object app {
def main(args: Array[String]) {
val log = LogManager.getRootLogger
log.setLevel(Level.WARN)
val conf = new SparkConf().setAppName("demo-app")
val sc = new SparkContext(conf)
log.warn("Hello demo")
val data = sc.parallelize(1 to 100000)
val mapper = Mapper(1)
val other = mapper.doSomeMappingOnDataSetAndLogIt(data)
other.collect()
log.warn("I am done")
}
}
本文介绍如何在Apache Spark中实现自定义日志记录。通过实例演示了如何设置日志级别,并展示了如何通过创建可序列化的类来确保日志信息能够正确地在集群中传播。
992

被折叠的 条评论
为什么被折叠?



