实时处理与流处理

本文深入探讨了实时处理与流式处理的概念、特点及应用实例,解释了实时系统与软实时处理的含义,并对比了两者在数据处理速度与时间约束上的差异。同时,文章还介绍了流式处理的基本原理及其在实际场景中的应用,如使用Storm框架进行流式处理,以及Spark如何支持流式计算。最后,通过具体案例展示了如何利用这些技术构建软实时系统。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

前言:作为一个程序员,总是能不时地听到各种新技术名词,大数据、云计算、实时处理、流式处理、内存计算… 但当我们听到这些时髦的名词时他们到底是在说什么?偶然搜到一个不错的帖子,就总结一下实时处理和流式处理的区别吧。

正文:要说实时处理就得先提一下实时系统(Real-timeSystem)。所谓实时系统就是能在严格的时间限制内响应请求的系统。例如如果某系统能严格保证在10毫秒内处理来自网络的NASDAQ股票报价,那么这个系统就可以算作实时系统,至于系统是通过软件还是硬件或者通过怎样的设计达到的都不限。

虽然看似简单,实际上现实世界中这种系统是很难实现的,尤其是软件实现的实时系统。因为你的进程可能随时被其他进程抢占,CPU调度器无法保证能给你的进程所需的时间和资源来在严格时间限制内完成响应。因此就有了各种实时操作系统内核。现实中实时系统的例子能想到的如军方的导弹控制系统和航天飞机等高精尖的软件系统了。

实时处理(Real-time Processingor Computing)又是什么?与实时系统类似,但软件工业中似乎对实时二字没有什么明确的定义。例如许多人说实时交易,实际上是因为市场数据瞬息万变,决策经常在毫秒间。一个软实时(Soft Real-time)的例子是Amazon要求所有软件子系统在处理99%的请求时,都能在100-200毫秒内要么给出结果要么立刻失败。

说完实时处理再看流式处理(Stream Processing)。望文生义,流式处理就是指源源不断的数据流过系统时,系统能够不停地连续计算。所以流式处理没有什么严格的时间限制,数据从进入系统到出来结果可能是需要一段时间。然而流式处理唯一的限制是系统长期来看的输出速率应当快于或至少等于输入速率。否则的话,数据岂不是会在系统中越积越多(不然数据哪去了)?如此,不管处理时是在内存、闪存还是硬盘,早晚都会空间耗尽的。就像雪崩效应,系统越来越慢,数据越积越多。

所以我们可以说Storm框架是一种流式处理系统的框架。如果我们的代码能够保证Storm的Topology中每个Bolt结点处理数据的时长一定,那么我们就相当于用Storm开发了一个(软)实时的系统。顺便提一句,又比如Spark这个主要是内存计算框架,在加入了Streaming Spark子项目后,能将数据流切分并转化成RDD进行后续计算,从而也支持了流式处理(否则之前Spark都是以固定的一坨数据为输入的)。

 

原文:What's the difference between real-timeprocessing and stream processing?

Usually,a system is called a real time system if it has tight deadlines within which aresult is guaranteed. For example, you can consider your TV to be a real timeprocessing system: given an analog or digital input, within say 1ms, acorresponding phosphor dot will light up on the screen. In the context ofsoftware systems, a system is usually called a real time system if it hasresponses that are guaranteed within hard "real-world" timedeadlines. For example, a system that guarantees the processing of a NASDAQstock quote coming in from the network within 10 ms would be considered a realtime processing system: whether this is achieved by using a softwarearchitecture that utilizes continuous (stream) processing or one shot processingin hardware is immaterial. The fact that there is a reasonably small real-worldguaranteed deadline for the processing makes it a real time system.

Inpractice though real time systems are extremely hard to implement using commonsoftware systems. For example, the vanilla linux kernel isn't a real timekernel: certain operations such as process scheduling, network packetprocessing etc. are implemented using algorithms that don't guarantee a hardtime limit. eg. If your process is preempted from CPU resources by a higherpriority process, the scheduler may not give your process the CPU resources itneeds to guarantee a response in the given deadline (depending on thescheduling algorithm). The same thing applies to network packets. There are, ofcourse, flavors of the kernel available that provide real time schedulingguarantees for processes etc. (QNX [1]comes to mind) Software systems in this area usually go for a flavor of realtime processing called soft real time computing where the deadline is not an absolute but aprobability. For example, Amazon requires all the software subcomponents on itspage to provide a result or fail within 100-200ms for 99% of all requests. Thisgives it a soft real time guarantee that a page will render within a given timelimit. 

Streamprocessing on the other hand refers to a methodof continuous computation that happens as data is flowing through the system.There are no compulsory time limitations in stream processing. For example, asystem that simply output the count of words present in a Tweet for 99.9% ofthe tweets it encountered but output the complete works of Shakespeare for theremaining 0.1% of tweets is a valid stream processing system. There is no fixedtime deadline on the output of the system when an input is received: the datais processed as it comes in and sometimes data might be awaiting processing.The only constraint on such a stream processing system is that its long termoutput rate should be faster or at least equal to the long term data input rate(otherwise the storage requirements of the system grow without bound).Additionally, it must have enough memory to store queued inputs should it bestuck while processing any item in the input stream.

Giventhis context, I'm sure it's easy to figure out that Storm is a streamprocessing system. You can use Storm to develop a (soft) real time system ifyou can place guarantees on the processing duration for all inputs at everystage of the topology. 

转载于:https://www.cnblogs.com/xiaomaohai/p/6157683.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值