How to Load Constants in Assembly for ARM Architecture

    ARM is a 32-bit CPU architecture where every instruction is 32 bits long. Any constants which are part of an instruction must be encoded within the 32 bits of the given instruction and this naturally limits the range of constants that can be represented in one instruction. This post will show you how we can deal with these limitations and how the latest revision of the ARM architecture (ARMv7) provides a simple and efficient solution.

    Most arithmetic and logical ARM instructions accept 3 parameters:

  • The destination: always a register.
  • Operand 1: always a register.
  • Operand 2: a register, an immediate constant value or a shifted register. We'll cover shifted registers in a future post. For now, we're only interested in the constants. Examples of such instructions are:
    add    r0, r1, r2    @ r0 = r1 + r2
    sub    r0, r1, #3    @ r0 = r1 - 3

    An Operand 2 immediate must obey the following rule to fit in the instruction: an 8-bit value rotated right by an even number of bits between 0 and 30 (inclusive). This allows for constants such as 0xFF (0xFF rotated right by 0), 0xFF00 (0xFF rotated right by 24) or 0xF000000F (0xFF rotated right by 4).

    Operand 2 immediates are also valid immediates for mov instructions, making it possible to move constant values into registers without performing any other computation:

    mov    r0, #0xFF0    @ r0 = 0xFF0

    In software - especially in languages like C - constants tend to be small. When they are not small they tend to be bit masks. Operand 2 immediates provide a reasonable compromise between constant coverage and encoding space; most common constants can be encoded directly.

Loading a Constant from Memory

    What happens if you need a constant which cannot be expressed as an Operand 2 immediate? The constant has to be moved into a register before use and there are many ways to do so. The traditional solution is to load the constant from memory.

    Loading a value from memory will require a pointer to the memory location of the value. Pointers need to be held in a register, so we are back to the same problem, an extra register is needed. However, in ARM, the program counter (pc) can generally be used like any other register and therefore can be used as a base pointer for a load operation. This allows you to store the constant relative to the instruction loading the constant. Loading the constant in a register then becomes something like this:

    ldr    r0, [pc, #offset]

    Here #offset is the offset in bytes of the constant relative to the program counter (PC). When executing an ARM instruction, PC reads as the address of the current instruction plus 8.#offset can take any values between -4095 and +4095 (inclusive).

    Knowing where to store the constants in memory (and keeping track of them) can be a tedious task. Thankfully most assemblers provide pseudo-instructions to simplify the operation. For example, in GNU assembler you can write this:

returns_0x12345678:
    ldr    r0, =0x12345678
    bx    lr               @ function return

    The above will assemble to this:

returns_0x12345678:
    ldr    r0, [pc, #0]    @ remember pc is 8 bytes ahead
    bx    lr               @ function return
    .word    0x12345678

    In fact the ldr= pseudo instruction is a bit more clever than it looks, as it will check if the given constant can be represented by an Operand 2 immediate and will generate a mov instruction if it can. A mov instruction will be faster than an ldr instruction as there is no need to read the constant from memory, also resulting in memory savings.

Loading a Constant from the Instruction Stream: ARMv7 way

    As mentioned earlier, there are other ways to load a constant. In the latest version of the ARM architecture, ARMv7, two new instructions were introduced to improve the situation:

  • movw, or move wide, will move a 16-bit constant into a register, implicitly zeroing the top 16 bits of the target register.
  • movt, or move top, will move a 16-bit constant into the top half of a given register without altering the bottom 16 bits. Now moving an arbitrary 32-bit constant is as simple as this:
    movw      r0, #0x5678      @ r0 = 0x00005678
    movt      r0, #0x1234      @ r0 = (r0 & 0x0000FFFF) | 0x12340000 (=0x12345678)

    Note that the order matters since movw will zero the upper 16 bits. Here again the GNU assembler provides some syntactic sugar: the prefixes :upper16: and :lower16: allow you to extract the corresponding half from a 32-bit constant:

    .equ    label, 0x12345678
    movw    r0, #:lower16:label
    movt    r0, #:upper16:label

    While this approach takes two instructions, it does not require any extra space to store the constant so both the movw/movt method and the ldr method will end up using the same amount of memory. Memory bandwidth is precious in and the movw/movt approach avoids an extra read on the data side, not to mention the read could have missed the cache.

    If you know you can use it, movw/movt is the recommended way to load a 32-bit constant. However, if it is possible to encode the 32-bit constant using an 8-bit immediate and if necessary rotated right, try to use Operand 2 directly, and avoid the need to use an extra register.

Two additional notes:

        1: According to ARMv7-M_ARM.pdf, page 121, "Modified Immediate", it's only possible to shift an 8-bit constant between 0 and 24, so the given example 0xF000000F would not be possible on ARM Cortex-M. However, it's possible to also use one of these constants:

%00000000abcdefgh00000000abcdefgh, %abcdefgh00000000abcdefgh00000000 or %abcdefghabcdefghabcdefghabcdefgh.

        2: In some cases, it might be quicker to load immediate 32-bit values from memory instead of using MOVW+MOVT. This occurs if you have two consecutive load instructions that can be pipelined. Thus if pipelining is possible, the first load instruction will take 2 clock cycles and the next will take one clock cycle. In other words: This requires the two load instruction to be right next to eachother, without any other instructions in between; otherwise both load instructions will use 2 clock cycles each, resulting in 4 clock cycles being used instead of 3. If using MOVW+MOVT, each of those will take one clock cycle. But if you're only loading a single 32-bit immediate value, I recommend using MOVW+MOVT.

original post at https://community.arm.com/processors/b/blog/posts/how-to-load-constants-in-assembly-for-arm-architecture


<think>好的,我现在需要帮助用户解决在TensorFlow中使用`convert_variables_to_constants_v2`时遇到的报错问题。首先,我需要仔细分析用户提供的引用内容,看看有没有相关的线索。 根据用户提供的引用[3],报错信息是`TypeError: 'NoneType' object is not subscriptable`,发生在尝试获取`model.inputs[0].shape`或`model.inputs[0].dtype`的时候。这说明`model.inputs[0]`可能是None,导致无法访问shape或dtype属性。这可能是因为模型输入没有正确指定,或者在构建模型时输入层没有正确配置。 另外,引用[4]提到`convert_variables_to_constants`函数的作用是将计算图中的变量转换为常量,从而生成一个包含网络结构和参数的单一文件。用户可能是在使用这个函数时遇到了问题,特别是在转换过程中某些节点没有被正确冻结,或者输入输出节点名称不正确。 接下来,我需要考虑`convert_variables_to_constants_v2`常见的错误原因。根据经验,可能的原因包括: 1. **输入输出节点名称错误**:用户可能在调用函数时指定的输出节点名称不正确,导致无法找到对应的节点。 2. **模型输入未正确指定**:如引用[3]中的错误,模型输入可能未被正确构建,导致输入层为None,无法获取shape或dtype。 3. **TensorFlow版本兼容性问题**:不同版本的TensorFlow可能在API使用上有差异,例如函数名称或参数的变化。 4. **会话(Session)未正确初始化**:在转换模型前,可能需要通过会话正确初始化变量,否则变量值无法被冻结。 现在,我需要综合这些信息,给出解决方案的步骤。首先,应该检查输入输出节点的名称是否正确。用户需要确认在调用`convert_variables_to_constants_v2`时传入的输出节点名称是否与模型中定义的输出层名称一致。可以通过`model.outputs`来获取正确的输出节点名称。 其次,针对输入层的问题,用户需要确保模型输入已经正确指定。在Keras模型中,如果在构建时没有明确设置输入形状,可能会导致`model.inputs`为None。此时,需要在模型定义时使用`Input`层明确指定输入形状,例如: ```python from tensorflow.keras.layers import Input input_tensor = Input(shape=(input_shape,)) model = Model(inputs=input_tensor, outputs=output) ``` 此外,检查TensorFlow的版本是否兼容。`convert_variables_to_constants_v2`是TensorFlow 2.x中的函数,如果用户使用的是旧版本,可能需要升级或调整代码。 会话管理方面,在TensorFlow 2.x中,默认是eager execution模式,但转换函数可能需要使用图模式。用户需要确保在正确的上下文中执行,例如使用`tf.compat.v1.Session`或者在保存前正确初始化变量。 最后,提供一个示例代码,展示如何正确使用`convert_variables_to_constants_v2`。例如: ```python import tensorflow as tf from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2 # 构建并训练模型 model = tf.keras.Sequential([...]) model.compile(...) model.fit(...) # 转换为具体函数 full_model = tf.function(lambda x: model(x)) full_model = full_model.get_concrete_function(tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype)) # 转换为常量图 frozen_func = convert_variables_to_constants_v2(full_model) tf.io.write_graph(frozen_func.graph.as_graph_def(), '.', 'model.pb', as_text=False) ``` 在这个过程中,需要特别注意输入形状的指定,确保`model.inputs[0]`不为None,并且输出节点名称正确。 总结可能的解决方案步骤后,还需要提醒用户常见的注意事项,例如版本问题、节点名称的正确性,以及输入层的明确指定。同时,建议用户检查模型构建过程,确保所有层都正确连接,输入输出没有未定义的部分。</think>### 如何修复 `convert_variables_to_constants_v2` 的报错问题 #### 常见原因及解决方案 1. **输入/输出节点名称错误** - **问题分析**:`convert_variables_to_constants_v2` 需要明确指定输出节点名称(如 `['sum_operation']` 在引用[2]中),若名称与实际模型输出层不匹配,会触发 `KeyError` 或 `NotFoundError`。 - **修复方法**: - 使用 `model.outputs` 获取输出节点名称: ```python output_names = [output.op.name for output in model.outputs] ``` - 确保传入的节点名称与模型定义一致[^2][^4]。 2. **输入张量未正确指定** - **问题分析**:如引用[3]中的 `TypeError: 'NoneType' object is not subscriptable`,说明 `model.inputs[0]` 为 `None`,可能因模型输入未定义形状(如未使用 `Input(shape=(...))` 明确指定)。 - **修复方法**: - 在构建模型时显式定义输入形状: ```python input_tensor = tf.keras.Input(shape=(28,28,1), name='input_layer') ``` - 或通过 `model.build(input_shape=(None, 28,28,1))` 动态构建[^3]。 3. **TensorFlow 版本兼容性** - **问题分析**:`convert_variables_to_constants_v2` 是 TensorFlow 2.x 的 API,旧版本(如 1.x)需使用 `convert_variables_to_constants`。 - **修复方法**: - 升级至 TensorFlow ≥ 2.1.0: ```bash pip install --upgrade tensorflow ``` - 或调整代码兼容旧版(需启用 `tf.compat.v1`)。 4. **会话(Session)未正确初始化** - **问题分析**:未在会话中初始化变量或冻结图时未关联有效会话。 - **修复方法**: - 在 TensorFlow 2.x 中使用 `tf.function` 包裹模型推理过程: ```python full_model = tf.function(lambda x: model(x)) concrete_func = full_model.get_concrete_function(tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype)) frozen_graph = convert_variables_to_constants_v2(concrete_func) ``` - 在 1.x 中需显式初始化会话变量[^2]。 --- #### 完整修复示例代码 ```python import tensorflow as tf from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2 # 1. 构建模型并确保输入形状明确 inputs = tf.keras.Input(shape=(28,28,1), name='input_layer') x = tf.keras.layers.Flatten()(inputs) outputs = tf.keras.layers.Dense(10, activation='softmax', name='output_layer')(x) model = tf.keras.Model(inputs=inputs, outputs=outputs) # 2. 转换为具体函数(需指定输入签名) concrete_func = tf.function(lambda x: model(x)).get_concrete_function( tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype) ) # 3. 冻结为常量图 frozen_func = convert_variables_to_constants_v2(concrete_func) # 4. 保存为 .pb 文件 tf.io.write_graph( graph_or_graph_def=frozen_func.graph, logdir='./', name='frozen_model.pb', as_text=False ) ``` --- #### 注意事项 - **版本检查**:确认 TensorFlow ≥ 2.1.0,使用 `print(tf.__version__)` 查看版本。 - **输入/输出节点名称**:通过 `model.inputs` 和 `model.outputs` 获取准确名称,避免手动输入错误。 - **动态形状问题**:若输入形状包含 `None`(如批次维度),需在 `get_concrete_function` 中保留 `None`: ```python tf.TensorSpec((None, 28,28,1), tf.float32) ``` ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值