文件流
>>> from pyspark import SparkContext
>>> from pyspark.streaming import StreamingContext
>>> ssc=StreamingContext(sc,10)
>>> lines=ssc.textFileStream('file:///usr/local/spark/mycode/streaming/logfile')
>>> words=lines.flatMap(lambda line:line.split(' '))
>>> wordCounts=words.map(lambda x:(x,1)).reduceByKey(lambda a,b:a+b)
>>> wordCounts.pprint()
>>> scc.start

最低0.47元/天 解锁文章
629

被折叠的 条评论
为什么被折叠?



