
Spark streaming
Pengsen Ma
这个作者很懒,什么都没留下…
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
9-4push方式整合Spark streaming.
进入:/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin/conf新建文件夹并修改它[hadoop@hadoop000 conf]$ vi flume_push_streaming.conf# exec-memory-avro.confsimple-agent.sources = netcat-sourcesimple-agent.sinks = avro-sinksimple-agent.channels = memory-channel.原创 2020-10-09 20:57:37 · 15075 阅读 · 0 评论 -
黑名单过滤
原创 2020-08-31 09:10:06 · 9456 阅读 · 0 评论 -
Window Operations(窗口函数的使用)
定时的进行一个时间段内的数据处理Spark Streaming also provides windowed computations, which allow you to apply transformations over a sliding window of data. The following figure illustrates this sliding window.As shown in the figure, every time the window slides over a原创 2020-08-30 10:17:45 · 8971 阅读 · 0 评论 -
Design Patterns for using foreachRDD
Design Patterns for using foreachRDDdstream.foreachRDD is a powerful primitive that allows data to be sent out to external systems.Often writing data to external system requires creating a connection object (e.g. TCP connection to a remote server) and us原创 2020-08-28 21:50:38 · 8647 阅读 · 0 评论 -
Transformations on DStreams和Output Operations on DStreams
Similar to that of RDDs, transformations allow the data from the input DStream to be modified. DStreams support many of the transformations available on normal Spark RDD’s. Some of the common ones are as follows.TransformationMeaningmap(func)Re原创 2020-08-24 08:43:57 · 8000 阅读 · 0 评论 -
Input DStreams and Receivers
Input DStreams are DStreams representing the stream of input data received from streaming sources. In the quick example, lines was an input DStream as it represented the stream of data received from the netcat server. Every input DStream is associated with原创 2020-08-24 08:36:45 · 7907 阅读 · 0 评论 -
Discretized Streams (DStreams)离散化流
Discretized Stream or DStream is the basic abstraction provided by Spark Streaming. It represents a continuous stream of data(连续的数据流), either the input data stream received from source(从源接收到的数据流), or the processed data stream generated by transforming the原创 2020-08-21 18:08:22 · 8365 阅读 · 0 评论 -
StreamingContext
Initializing StreamingContextTo initialize a Spark Streaming program, a StreamingContext object has to be created which is the main entry point of all Spark Streaming functionality.A StreamingContext object can be created from a SparkConf object.val con原创 2020-08-21 17:43:22 · 7922 阅读 · 0 评论 -
Spark streaming细粒度工作原理
原创 2020-08-21 11:21:53 · 7880 阅读 · 0 评论 -
Spark streaming粗粒度工作原理
工作原理:粗粒度Spark St reaming接收到实时数据流,把数据按照指定的时间段切成片片小的数 据块,然后把小的数据块传给Spark Engine处理。原创 2020-08-21 11:10:14 · 7819 阅读 · 0 评论 -
spark-submit的使用
spark-submit的使用使用spark-submit来提交我们的Spark应用程序的脚本(生产):先启动端口9999[hadoop@hadoop000 ~]$ nc -lk 9999然后再[hadoop@hadoop000 bin]下运行以下代码./spark-submit --master local[2] \--class org.apache.spark.examples.streaming.NetworkWordCount \--name NetworkWordCount \原创 2020-08-21 11:04:51 · 8061 阅读 · 0 评论 -
Spark streaming 概述
Spark Streaming个人的定义:将不同的数据源的数据经过Spark St reaming处理之后将结果输出到外部文件系统特点:低延时能从错误中高效的恢复: fault-tolerant能够运行在成百上千的节点能够将批处理、机器学习、图计算等子框架和Spark St reaming综合起来使用Spark Streaming receives live input data streams and divides the data into batches, which are the原创 2020-08-21 10:34:57 · 7900 阅读 · 1 评论