Guaranteeing Message Processing

本文深入探讨了Apache Storm如何确保每条消息从源头开始被完全处理。通过锚定机制和跟踪消息树状态,Storm实现了消息的可靠传递。文章还介绍了如何使用Storm的可靠性API以及在不同故障场景下如何避免数据丢失。

Storm guarantees that each message coming off a spout will be fully processed. This page describes how Storm accomplishes this guarantee and what you have to do as a user to benefit from Storm's reliability capabilities.

What does it mean for a message to be "fully processed"?

A tuple coming off a spout can trigger thousands of tuples to be created based on it. Consider, for example, the streaming word count topology:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("sentences", new KestrelSpout("kestrel.backtype.com",
                                               22133,
                                               "sentence_queue",
                                               new StringScheme()));
builder.setBolt("split", new SplitSentence(), 10)
        .shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 20)
        .fieldsGrouping("split", new Fields("word"));

This topology reads sentences off of a Kestrel queue, splits the sentences into its constituent words, and then emits for each word the number of times it has seen that word before. A tuple coming off the spout triggers many tuples being created based on it: a tuple for each word in the sentence and a tuple for the updated count for each word. The tree of messages looks something like this:

Tuple tree

Storm considers a tuple coming off a spout "fully processed" when the tuple tree has been exhausted and every message in the tree has been processed. A tuple is considered failed when its tree of messages fails to be fully processed within a specified timeout. This timeout can be configured on a topology-specific basis using the Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS configuration and defaults to 30 seconds.

What happens if a message is fully processed or fails to be fully processed?

To understand this question, let's take a look at the lifecycle of a tuple coming off of a spout. For reference, here is the interface that spouts implement (see the Javadoc for more information):

public interface ISpout extends Serializable {
    void open(Map conf, TopologyContext context, SpoutOutputCollector collector);
    void close();
    void nextTuple();
    void ack(Object msgId);
    void fail(Object msgId);
}

First, Storm requests a tuple from the Spout by calling the nextTuple method on the Spout. The Spout uses the SpoutOutputCollector provided in the open method to emit a tuple to one of its output streams. When emitting a tuple, the Spout provides a "message id" that will be used to identify the tuple later. For example, the KestrelSpout reads a message off of the kestrel queue and emits as the "message id" the id provided by Kestrel for the message. Emitting a message to the SpoutOutputCollector looks like this:

_collector.emit(new Values("field1", "field2", 3) , msgId);

Next, the tuple gets sent to consuming bolts and Storm takes care of tracking the tree of messages that is created. If Storm detects that a tuple is fully processed, Storm will call the ack method on the originating Spout task with the message id that the Spout provided to Storm. Likewise, if the tuple times-out Storm will call the fail method on the Spout. Note that a tuple will be acked or failed by the exact same Spout task that created it. So if a Spout is executing as many tasks across the cluster, a tuple won't be acked or failed by a different task than the one that created it.

Let's use KestrelSpout again to see what a Spout needs to do to guarantee message processing. When KestrelSpout takes a message off the Kestrel queue, it "opens" the message. This means the message is not actually taken off the queue yet, but instead placed in a "pending" state waiting for acknowledgement that the message is completed. While in the pending state, a message will not be sent to other consumers of the queue. Additionally, if a client disconnects all pending messages for that client are put back on the queue. When a message is opened, Kestrel provides the client with the data for the message as well as a unique id for the message. The KestrelSpout uses that exact id as the "message id" for the tuple when emitting the tuple to the SpoutOutputCollector. Sometime later on, when ack or fail are called on the KestrelSpout, the KestrelSpout sends an ack or fail message to Kestrel with the message id to take the message off the queue or have it put back on.

What is Storm's reliability API?

There's two things you have to do as a user to benefit from Storm's reliability capabilities. First, you need to tell Storm whenever you're creating a new link in the tree of tuples. Second, you need to tell Storm when you have finished processing an individual tuple. By doing both these things, Storm can detect when the tree of tuples is fully processed and can ack or fail the spout tuple appropriately. Storm's API provides a concise way of doing both of these tasks.

Specifying a link in the tuple tree is called anchoring. Anchoring is done at the same time you emit a new tuple. Let's use the following bolt as an example. This bolt splits a tuple containing a sentence into a tuple for each word:

public class SplitSentence extends BaseRichBolt {
        OutputCollector _collector;

        public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
            _collector = collector;
        }

        public void execute(Tuple tuple) {
            String sentence = tuple.getString(0);
            for(String word: sentence.split(" ")) {
                _collector.emit(tuple, new Values(word));
            }
            _collector.ack(tuple);
        }

        public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("word"));
        }        
    }

Each word tuple is anchored by specifying the input tuple as the first argument to emit. Since the word tuple is anchored, the spout tuple at the root of the tree will be replayed later on if the word tuple failed to be processed downstream. In contrast, let's look at what happens if the word tuple is emitted like this:

_collector.emit(new Values(word));

Emitting the word tuple this way causes it to be unanchored. If the tuple fails be processed downstream, the root tuple will not be replayed. Depending on the fault-tolerance guarantees you need in your topology, sometimes it's appropriate to emit an unanchored tuple.

An output tuple can be anchored to more than one input tuple. This is useful when doing streaming joins or aggregations. A multi-anchored tuple failing to be processed will cause multiple tuples to be replayed from the spouts. Multi-anchoring is done by specifying a list of tuples rather than just a single tuple. For example:

List<Tuple> anchors = new ArrayList<Tuple>();
anchors.add(tuple1);
anchors.add(tuple2);
_collector.emit(anchors, new Values(1, 2, 3));

Multi-anchoring adds the output tuple into multiple tuple trees. Note that it's also possible for multi-anchoring to break the tree structure and create tuple DAGs, like so:

Tuple DAG

Storm's implementation works for DAGs as well as trees (pre-release it only worked for trees, and the name "tuple tree" stuck).

Anchoring is how you specify the tuple tree -- the next and final piece to Storm's reliability API is specifying when you've finished processing an individual tuple in the tuple tree. This is done by using the ack and fail methods on the OutputCollector. If you look back at the SplitSentence example, you can see that the input tuple is acked after all the word tuples are emitted.

You can use the fail method on the OutputCollector to immediately fail the spout tuple at the root of the tuple tree. For example, your application may choose to catch an exception from a database client and explicitly fail the input tuple. By failing the tuple explicitly, the spout tuple can be replayed faster than if you waited for the tuple to time-out.

Every tuple you process must be acked or failed. Storm uses memory to track each tuple, so if you don't ack/fail every tuple, the task will eventually run out of memory.

A lot of bolts follow a common pattern of reading an input tuple, emitting tuples based on it, and then acking the tuple at the end of the execute method. These bolts fall into the categories of filters and simple functions. Storm has an interface called BasicBolt that encapsulates this pattern for you. The SplitSentence example can be written as a BasicBolt like follows:

public class SplitSentence extends BaseBasicBolt {
        public void execute(Tuple tuple, BasicOutputCollector collector) {
            String sentence = tuple.getString(0);
            for(String word: sentence.split(" ")) {
                collector.emit(new Values(word));
            }
        }

        public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("word"));
        }        
    }

This implementation is simpler than the implementation from before and is semantically identical. Tuples emitted to BasicOutputCollector are automatically anchored to the input tuple, and the input tuple is acked for you automatically when the execute method completes.

In contrast, bolts that do aggregations or joins may delay acking a tuple until after it has computed a result based on a bunch of tuples. Aggregations and joins will commonly multi-anchor their output tuples as well. These things fall outside the simpler pattern of IBasicBolt.

How do I make my applications work correctly given that tuples can be replayed?

As always in software design, the answer is "it depends." Storm 0.7.0 introduced the "transactional topologies" feature, which enables you to get fully fault-tolerant exactly-once messaging semantics for most computations. Read more about transactional topologies here.

How does Storm implement reliability in an efficient way?

A Storm topology has a set of special "acker" tasks that track the DAG of tuples for every spout tuple. When an acker sees that a DAG is complete, it sends a message to the spout task that created the spout tuple to ack the message. You can set the number of acker tasks for a topology in the topology configuration using Config.TOPOLOGY_ACKERS. Storm defaults TOPOLOGY_ACKERS to one task -- you will need to increase this number for topologies processing large amounts of messages.

The best way to understand Storm's reliability implementation is to look at the lifecycle of tuples and tuple DAGs. When a tuple is created in a topology, whether in a spout or a bolt, it is given a random 64 bit id. These ids are used by ackers to track the tuple DAG for every spout tuple.

Every tuple knows the ids of all the spout tuples for which it exists in their tuple trees. When you emit a new tuple in a bolt, the spout tuple ids from the tuple's anchors are copied into the new tuple. When a tuple is acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed. In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the new tuples in the tree that were anchored to me".

For example, if tuples "D" and "E" were created based on tuple "C", here's how the tuple tree changes when "C" is acked:

What happens on an ack

Since "C" is removed from the tree at the same time that "D" and "E" are added to it, the tree can never be prematurely completed.

There are a few more details to how Storm tracks tuple trees. As mentioned already, you can have an arbitrary number of acker tasks in a topology. This leads to the following question: when a tuple is acked in the topology, how does it know to which acker task to send that information?

Storm uses mod hashing to map a spout tuple id to an acker task. Since every tuple carries with it the spout tuple ids of all the trees they exist within, they know which acker tasks to communicate with.

Another detail of Storm is how the acker tasks track which spout tasks are responsible for each spout tuple they're tracking. When a spout task emits a new tuple, it simply sends a message to the appropriate acker telling it that its task id is responsible for that spout tuple. Then when an acker sees a tree has been completed, it knows to which task id to send the completion message.

Acker tasks do not track the tree of tuples explicitly. For large tuple trees with tens of thousands of nodes (or more), tracking all the tuple trees could overwhelm the memory used by the ackers. Instead, the ackers take a different strategy that only requires a fixed amount of space per spout tuple (about 20 bytes). This tracking algorithm is the key to how Storm works and is one of its major breakthroughs.

An acker task stores a map from a spout tuple id to a pair of values. The first value is the task id that created the spout tuple which is used later on to send completion messages. The second value is a 64 bit number called the "ack val". The ack val is a representation of the state of the entire tuple tree, no matter how big or how small. It is simply the xor of all tuple ids that have been created and/or acked in the tree.

When an acker task sees that an "ack val" has become 0, then it knows that the tuple tree is completed. Since tuple ids are random 64 bit numbers, the chances of an "ack val" accidentally becoming 0 is extremely small. If you work the math, at 10K acks per second, it will take 50,000,000 years until a mistake is made. And even then, it will only cause data loss if that tuple happens to fail in the topology.

Now that you understand the reliability algorithm, let's go over all the failure cases and see how in each case Storm avoids data loss:

  • A tuple isn't acked because the task died: In this case the spout tuple ids at the root of the trees for the failed tuple will time out and be replayed.
  • Acker task dies: In this case all the spout tuples the acker was tracking will time out and be replayed.
  • Spout task dies: In this case the source that the spout talks to is responsible for replaying the messages. For example, queues like Kestrel and RabbitMQ will place all pending messages back on the queue when a client disconnects.

As you have seen, Storm's reliability mechanisms are completely distributed, scalable, and fault-tolerant.

Tuning reliability

Acker tasks are lightweight, so you don't need very many of them in a topology. You can track their performance through the Storm UI (component id "__acker"). If the throughput doesn't look right, you'll need to add more acker tasks.

If reliability isn't important to you -- that is, you don't care about losing tuples in failure situations -- then you can improve performance by not tracking the tuple tree for spout tuples. Not tracking a tuple tree halves the number of messages transferred since normally there's an ack message for every tuple in the tuple tree. Additionally, it requires fewer ids to be kept in each downstream tuple, reducing bandwidth usage.

There are three ways to remove reliability. The first is to set Config.TOPOLOGY_ACKERS to 0. In this case, Storm will call the ack method on the spout immediately after the spout emits a tuple. The tuple tree won't be tracked.

The second way is to remove reliability on a message by message basis. You can turn off tracking for an individual spout tuple by omitting a message id in the SpoutOutputCollector.emit method.

Finally, if you don't care if a particular subset of the tuples downstream in the topology fail to be processed, you can emit them as unanchored tuples. Since they're not anchored to any spout tuples, they won't cause any spout tuples to fail if they aren't acked.

<think>好的,我现在需要回答用户关于Linux SPI驱动中spi_sync函数的实现和用法的问题。首先,我得回忆一下自己对这个函数的理解,以及查阅用户提供的引用资料来确保准确性。 用户的问题主要分两部分:实现和用法。我需要分别处理这两个方面,同时结合提供的引用内容来支撑回答。 首先,关于spi_sync的实现。根据引用[2]和引用[3],spi_sync和spi_async都是核心函数,但spi_sync是同步的,而spi_async是异步的。在引用[3]中提到,spi_async没有完成量,不会休眠,适用于中断上下文。而spi_sync可能使用完成量,会阻塞进程直到传输完成。我需要确认这一点是否正确,可能需要进一步回忆内核代码的结构。例如,spi_sync内部可能会调用spi_async,然后等待完成。这点需要说明清楚。 另外,引用[4]提到在spi_sync传输期间,片选信号可能一直保持有效,特别是当在一个message中处理多个transfer时。这可能涉及到message的结构,每个transfer可能有不同的片选控制,但整个message的传输期间片选是否持续有效,这需要解释清楚。 关于spi_sync的用法,用户需要知道如何构造spi_message和spi_transfer结构体。比如,如何填充tx_buf和rx_buf,设置长度len,以及可能的延迟设置。引用[1]提到一次传输的数据大小可能受限于底层实现,比如最大65535字节,这可能需要在用法中提及,避免用户设置过大的数据块。 然后,我需要将实现部分拆分成几个步骤:初始化message,添加transfer,调用spi_sync,并处理可能的错误。同时,代码示例可能会有帮助,但用户提供的引用中没有具体代码,但根据常规的SPI驱动开发,给出一个示例是必要的,但需要确保正确性。 在引用[4]中提到的,用户可能需要处理指令发送和数据接收的组合,例如发送指令后跟无效字节以保持时钟,从而在接收时丢弃无效数据。这可以作为用法中的一个注意点或示例,说明如何构造这样的传输。 此外,关于同步和异步的选择,需要说明在什么情况下使用spi_sync。比如,当需要确保传输完成后再继续执行后续代码时,使用同步方式更合适,而异步则用于非阻塞场景。 可能的问题点:片选信号的管理,spi_sync是否会保持片选有效。根据引用[4],当在一个message中有多个transfer时,片选可能会在message传输期间保持有效,而每个transfer之间可能有片选的短暂无效。需要明确这一点,或者说明如何通过设置transfer的cs_change来控制片选行为。 另外,错误处理也是重要部分,例如spi_sync返回值的检查,可能返回的错误代码如-EBUSY(当总线被锁时,引用[3]提到)。 总结结构: 1. 函数作用:同步传输,阻塞直到完成。 2. 实现机制:调用spi_async,然后等待完成量,可能涉及总线锁(引用[3]的自旋锁)。 3. 使用步骤: a. 初始化message。 b. 设置transfer结构体,包括缓冲区、长度、cs_change等。 c. 添加transfer到message。 d. 调用spi_sync。 4. 注意事项:数据长度限制(引用[1]),片选控制(引用[4]),错误处理。 5. 示例代码:构造发送和接收的例子,可能包括发送指令后接收数据的场景。 需要确保代码示例正确,例如使用spi_message_init,spi_message_add_tail,以及处理tx和rx缓冲区。同时,注意内核API的版本是否影响用法,但用户没有提到特定版本,所以假设是通用的。 最后,生成相关问题,可能涉及同步与异步的区别,如何优化传输,或者如何处理片选信号等。</think>### spi_sync函数的实现与用法 #### 一、`spi_sync`函数的作用 `spi_sync`是Linux SPI驱动中用于**同步传输数据**的核心函数。它会阻塞当前进程,直到数据传输完成,适用于需要严格时序控制的场景[^2][^3]。 --- #### 二、实现机制 1. **同步流程** - `spi_sync`内部调用`spi_async`发起异步传输,然后通过`wait_for_completion`等待传输完成[^3]。 - 使用**自旋锁**保护总线资源(`bus_lock_spinlock`),避免多线程竞争[^3]。 - 若总线被占用(`bus_lock_flag`为真),返回`-EBUSY`[^3]。 2. **片选信号控制** - 在一个`spi_message`中,若多个`spi_transfer`通过`cs_change`标志控制片选信号,则片选可能在传输间隙短暂失效。若需持续有效,可将多个操作合并为一个`transfer`[^4]。 --- #### 三、使用方法 1. **基本步骤** ```c // 1. 初始化spi_message struct spi_message msg; spi_message_init(&msg); // 2. 配置spi_transfer(例如发送16字节) struct spi_transfer xfer = { .tx_buf = tx_buffer, .rx_buf = rx_buffer, .len = 16, .cs_change = 0, // 片选是否在transfer间失效 }; // 3. 将transfer添加到message spi_message_add_tail(&xfer, &msg); // 4. 调用spi_sync同步传输 int ret = spi_sync(spi_device, &msg); if (ret < 0) { // 错误处理(如超时、总线冲突) } ``` 2. **数据长度限制** - 单次传输长度通常不超过$0xFFFF$(65535字节),具体取决于控制器实现[^1]。 3. **混合收发场景** - 发送指令后接收数据时,可通过填充无效字节保持时钟: ```c // 示例:发送1字节指令,接收4字节数据 u8 tx_buf[5] = {0xA5, 0x00, 0x00, 0x00, 0x00}; // 指令+4个无效字节 u8 rx_buf[5]; struct spi_transfer xfer = { .tx_buf = tx_buf, .rx_buf = rx_buf, .len = 5, }; // 调用spi_sync后,丢弃rx_buf[0],保留后4字节有效数据[^4] ``` --- #### 四、典型问题与注意事项 1. **错误处理** - 检查返回值:`-EBUSY`表示总线被占用,`-EIO`表示传输失败。 2. **性能影响** - 同步传输会阻塞进程,高频调用可能影响实时性,此时可考虑`spi_async`[^3]。 3. **片选信号行为** - 若需在整个传输期间保持片选有效,需确保`spi_transfer.cs_change = 0`[^4]。 ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值