tensorflow performance——Model optimization 模型优化

本文探讨了在训练机器学习模型时,如何通过优化TensorFlow代码、使用高效数据输入管道和GPU推理优化等方法来提升性能。介绍了TensorFlow模型优化工具包的各种技术,包括参数减少、表示精度降低和模型拓扑更新,以及模型量化对移动设备推理效率的影响。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Performance

Performance is an important consideration when training machine learning models.

Performance speeds up and scales research while also providing end users with near instant predictions.

This section provides details on the high level APIs to use along with best practices to build and train high performance models, and quantize models for the least latency and highest throughput for inference.

在训练机器学习模型时,性能是一个重要的考虑因素。性能提升和细粒度研究,同时也给终端用户提供实时预测。本节提供关于使用高级APIs的细节,以及构建和训练高性能模型、针对最少的延迟来量化模型和对推理的最高吞吐量的最佳实战经验。

 

 

 

Tensorflow Model Optimization Toolkit is a set of techniques for optimizing models for inference:

Tensorflow 模型优化工具包是针对优化推理推断模型的一些技术:

描述了训练后的量化

 

XLA (Accelerated Linear Algebra) is an experimental compiler for linear algebra that optimizes TensorFlow computations. The following guides explore XLA:

  • XLA Overview, which introduces XLA.
  • Broadcasting Semantics, which describes XLA's broadcasting semantics.
  • Developing a new back end for XLA, which explains how to re-target TensorFlow in order to optimize the performance of the computational graph for particular hardware.
  • Using JIT Compilation, which describes the XLA JIT compiler that compiles and runs parts of TensorFlow graphs via XLA in order to optimize performance.
  • Operation Semantics, which is a reference manual describing the semantics of operations in the ComputationBuilder interface.
  • Shapes and Layout, which details the Shape protocol buffer.
  • Using AOT compilation, which explains tfcompile, a standalone tool that compiles TensorFlow graphs into executable code in order to optimize performance.

 

Model optimization 模型优化

Inference efficiency is a critical issue when deploying machine learning models to mobile devices.

Where the computational demand for training grows with the number of models trained on different architectures, the computational demand for inference grows in proportion to the number of users.

The Tensorflow Model Optimization Toolkit minimizes the complexity of inference—the model size, the latency and power consumption.

当部署机器学习模型到移动设备时,推理的效率时至关重要的因素。

训练的计算需求随着在不同架构上训练的模型的数量而增长时,推断的计算量随着用户数量成比例增长。

Tensorflow 模型优化工具包最小化推断的复杂度,模型大小、延迟和功耗。

Use cases 使用场景

Model optimization is useful for:

  • Deploying models to edge devices with restrictions on processing, memory, or power-consumption. For example, mobile and Internet of Things (IoT) devices.
  • 部署模型到对处理、内存或功耗有限制的edge设备。例如,移动和IOT物联网设备
  • Reduce the payload size for over-the-air model updates.
  • 减小无效模型的有效负载大小
  • Execution on hardware constrained by fixed-point operations.
  • 在受定点操作限制的设备上执行
  • Optimize models for special purpose hardware accelerators.
  • 对于特殊目的的硬件加速器优化模型

Optimization methods 优化方法

Model optimization uses multiple techniques:

  • Reduced parameter count, for example, pruning and structured pruning.
  • 减少参数两,例如,剪枝和结构化剪枝。
  • Reduced representational precision, for example, quantization.
  • 降低表示精度,例如,量化
  • Update the original model topology to a more efficient one, with reduced parameters or faster execution, for example, tensor decomposition methods and distillation.
  • 将原始模型的拓扑结构更新为一个更有效的,减少参数或运行更快,例如,张量分解方法和蒸馏

Model quantization 模型量化

Quantizing deep neural networks uses techniques that allow for reduced precision representations of weights and, optionally, activations for both storage and computation.

量化深度神经网络使用的技术可以降低权重的表达精度,可视情况而定减少存储和计算的执行。

Quantization provides several benefits:

量化提供了几个好处:

  • Support on existing CPU platforms.
  • 支持现有的CPU平台
  • Quantizing activations reduces memory access costs for reading and storing intermediate activations.
  • 量化激活减少了读取和存储中间激活的内存访问成本
  • Many CPU and hardware accelerator implementations provide SIMD instruction capabilities, which are especially beneficial for quantization.

许多CPU和硬件加速器实现提供了SIMD指令功能,这对量化尤其有利。

 

TensorFlow Lite provides several levels of support for quantization.

Tensorflow lite为量化提供了多个级别的支持

Post-training quantization quantizes weights and activations post training and is very easy to use. Quantization-aware training allows for training networks that can be quantized with minimal accuracy drop and is only available for a subset of convolutional neural network architectures.

Post-training quantization 将权重和激活量化为训练后,且使用简单。

Quantization-aware training 考虑到训练网络可以以最小的准确率下降来量化,而且只是对卷积神经网络结构的一个子集有用。

 

Latency and accuracy results 延迟和准确率结果

Below are the results of the latency and accuracy of post-training quantization and quantization-aware training on a few models.

下边是在几个模型上进行了post-training quantization 和 quantization-aware training的延迟和准确率的结结果

All latency numbers are measured on Pixel 2 devices using a single big core.

所有的延迟数量是在两个设备上测试的用单一大的那个。

As the toolkit improves, so will the numbers here:

随着工具包的改进,所以这里的数据也会变化:

Model

Top-1 Accuracy (Original)

Top-1 Accuracy (Post Training Quantized)

Top-1 Accuracy (Quantization Aware Training)

Latency (Original) (ms)

Latency (Post Training Quantized) (ms)

Latency (Quantization Aware Training) (ms)

Size (Original) (MB)

Size (Optimized) (MB)

Mobilenet-v1-1-224

0.709

0.657

0.70

180

145

80.2

16.9

4.3

Mobilenet-v2-1-224

0.719

0.637

0.709

117

121

80.3

14

3.6

Inception_v3

0.78

0.772

0.775

1585

1187

637

95.7

23.9

Resnet_v2_101

0.770

0.768

N/A

3973

2868

N/A

178.3

44.9

Table 1 Benefits of model quantization for select CNN models

Choice of quantization tool

As a starting point, check if the models in the TensorFlow Lite model repository can work for your application. If not, we recommend that users start with the post-training quantization tool since this is broadly applicable and does not require training data.

For cases where the accuracy and latency targets are not met, or hardware accelerator support is important, quantization-aware training is the better option.

作为起点,检查tensorflow lite模型存储库中国的模型是否适用于你的应用。如果不适用,我们建议用户从post-training quantization工具开始,因为这个是广泛适用的且不需要训练数据。

对于一些情况,不满足准确率和延迟目标的,或硬件加速器支持很重要的,quantization-aware training是一个更好的选择。

 

 

Post-training quantization

Post-training quantization is a general technique to reduce the model size while also providing up to 3x lower latency with little degradation in model accuracy. Post-training quantization quantizes weights to 8-bits of precision from floating-point.

Post-training quantization是一种通用技术,来减小模型尺寸同时提升延迟降低3倍并且模型准确率只有一点点的降低。Post-training quantization将权重从浮点型量化为8位的精度

This technique is enabled as an option in TensorFlow Lite model converter:

TensorFlow Lite model converter中这种技术可以作为可选择的

import tensorflow as tf

converter = tf.contrib.lite.TocoConverter.from_saved_model(saved_model_dir)

converter.post_training_quantize = True

tflite_quantized_model = converter.convert()

open("quantized_model.tflite", "wb").write(tflite_quantized_model)



At inference, weights are converted from 8-bits of precision to floating-point and computed using floating point kernels. This conversion is done once and cached to reduce latency.

在推理时,权值从8-bits精度转化到float型,并且用float型计算。这种转化只进行一次,并进行缓存以减少延迟。

To further improve latency, hybrid operators dynamically quantize activations to 8-bits and perform computations with 8-bit weights and activations. This optimization provides latencies close to fully fixed-point inference. However, the outputs are still stored using floating-point, so the speedup with hybrid ops is less than a full fixed-point computation.
为了进一步提高延迟,混合操作符将激活动态量化为8位,并使用8位权值和激活执行计算。这种优化提供了接近完全定点推理的延迟。但是,输出仍然使用浮点数存储,因此混合ops的加速比完全的定点计算要小。

Hybrid ops are available for the most compute-intensive operators in a network:

混合操作适用于网络中计算最密集的操作:

Since weights are quantized post-training, there could be an accuracy loss, particularly for smaller networks. Pre-trained fully quantized models are provided for specific networks in the TensorFlow Lite model repository. It is important to check the accuracy of the quantized model to verify that any degradation in accuracy is within acceptable limits. There is a tool to evaluate TensorFlow Lite model accuracy.

因为权重是在训练后进行量化的,所以应该有一个精度的损失,尤其对于小网络。在tensorflow lite 模型库中对于特殊的网络提供了预先训练好的完全量化的模型。它对于检查量化模型的精度是重要的来验证任何精度上的下降是否在可接受的范围内。这有个工具来评估tensorflow lite模型精度

If the accuracy drop is too high, consider using quantization aware training.

如果精度下降过高,考虑用quantization aware training

Representation for quantized tensors 量化张量的表示

TensorFlow approaches the conversion of floating-point arrays of numbers into 8-bit representations as a compression problem.

Since the weights and activation tensors in trained neural network models tend to have values that are distributed across comparatively small ranges (for example, -15 to +15 for weights or -500 to 1000 for image model activations).

And since neural nets tend to be robust handling noise, the error introduced by quantizing to a small set of values maintains the precision of the overall results within an acceptable threshold.

A chosen representation must perform fast calculations, especially the large matrix multiplications that comprise the bulk of the computations while running a model.

Tensorflowfloat型的数据数组到8-bit表示的转化问题看作压缩问题来处理。  由于在训练神经网络模型中的权值和激活张量的值往往分布在比较小的范围内(例如,权值在-1515,图像模型几多在-5001000)。     所选的表示必须执行更快的计算,尤其在运行一个模型时包含了大量计算的大型矩阵乘法

 

This is represented with two floats that store the overall minimum and maximum values corresponding to the lowest and highest quantized value.

这是用2个浮点表示的,存储着总体最小值和最大值对应于最低和最高量化值

Each entry in the quantized array represents a float value in that range, distributed linearly between the minimum and maximum.

在量化数组中的每个条目表示在那个范围的浮点值,线性分布在最小和最大值之间

For example, with a minimum of -10.0 and maximum of 30.0f, and an 8-bit array, the quantized values represent the following:

例如,最小值为-10.0,最大值为30.0,一个8-bit数组,量化后的值表示如下:

Quantized

Float

0

-10.0

128

10.0

255

30.0

Table 2: Example quantized value range

The advantages of this representation format are:

这种表达方式的好处是:

  • It efficiently represents an arbitrary magnitude of ranges.
  • 可以有效的表示任何大小范围
  • The values don't have to be symmetrical.
  • 值没必要是对称的
  • The format represents both signed and unsigned values.
  • 这种方式可以表示有符号和无符号的值
  • The linear spread makes multiplications straightforward.
  • 这个线性扩展使乘法简单明了。

 

 

<think>我们使用TensorFlowModelOptimizationToolkit(TFMOT)来实现模型优化,包括剪枝、量化和权重聚类。下面我们将展示如何使用TFMOT对TensorFlow模型进行优化,并提供完整的代码示例。</think>### TensorFlow Model Optimization 实战指南 TensorFlow Model Optimization Toolkit (TFMOT) 提供了一系列强大的模型优化技术,以下是完整的优化流程实现: ```python import tensorflow as tf import tensorflow_model_optimization as tfmot import numpy as np import os # 加载数据集 (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() x_train = x_train.reshape(60000, 28, 28, 1).astype('float32') / 255.0 x_test = x_test.reshape(10000, 28, 28, 1).astype('float32') / 255.0 # 基础模型架构 def create_base_model(): model = tf.keras.Sequential([ tf.keras.layers.Conv2D(16, 3, activation='relu', input_shape=(28, 28, 1)), tf.keras.layers.MaxPooling2D(), tf.keras.layers.Flatten(), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dense(10, activation='softmax') ]) return model # 1. 剪枝优化 def apply_pruning(model): """应用剪枝优化""" # 定义剪枝参数 pruning_params = { 'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay( initial_sparsity=0.30, # 初始稀疏度 final_sparsity=0.80, # 最终稀疏度 begin_step=0, end_step=1000 ), 'block_size': (1, 1), # 块稀疏结构 'block_pooling_type': 'AVG' } # 应用剪枝 model_pruned = tfmot.sparsity.keras.prune_low_magnitude(model, **pruning_params) # 添加剪枝回调 callbacks = [ tfmot.sparsity.keras.UpdatePruningStep() ] return model_pruned, callbacks # 2. 量化感知训练 def apply_quantization_aware_training(model): """应用量化感知训练""" quantize_model = tfmot.quantization.keras.quantize_model model_quantized = quantize_model(model) return model_quantized # 3. 权重聚类 def apply_weight_clustering(model, num_clusters=16): """应用权重聚类""" clustering_params = { 'number_of_clusters': num_clusters, 'cluster_centroids_init': tfmot.clustering.keras.CentroidInitialization.KMEANS_PLUS_PLUS } # 聚类整个模型 clustered_model = tfmot.clustering.keras.cluster_weights(model, **clustering_params) return clustered_model # 4. 训练和评估函数 def train_and_evaluate(model, name, callbacks=None): """训练并评估模型""" print(f"\n===== 训练 {name} 模型 =====") # 编译模型 model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) # 训练模型 history = model.fit( x_train, y_train, epochs=5, validation_split=0.2, batch_size=128, callbacks=callbacks if callbacks else [] ) # 评估模型 _, accuracy = model.evaluate(x_test, y_test, verbose=0) print(f"{name} 测试准确率: {accuracy:.4f}") # 保存模型 model.save(f"{name}.h5") # 计算模型大小 size_bytes = os.path.getsize(f"{name}.h5") size_mb = size_bytes / (1024 * 1024) print(f"{name} 模型大小: {size_mb:.2f} MB") return model, accuracy, size_mb # 5. 量化转换 def convert_to_tflite(model, model_name, quantize_mode='default'): """转换为TFLite格式""" converter = tf.lite.TFLiteConverter.from_keras_model(model) # 设置优化选项 if quantize_mode == 'float16': converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.target_spec.supported_types = [tf.float16] elif quantize_mode == 'full_int8': converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = lambda: representative_data_gen() elif quantize_mode == 'dynamic_range': converter.optimizations = [tf.lite.Optimize.DEFAULT] # 转换为TFLite tflite_model = converter.convert() # 保存模型 tflite_path = f"{model_name}_{quantize_mode}.tflite" with open(tflite_path, 'wb') as f: f.write(tflite_model) # 获取模型大小 size_bytes = os.path.getsize(tflite_path) size_mb = size_bytes / (1024 * 1024) print(f"{tflite_path} 大小: {size_mb:.2f} MB") return tflite_path, size_mb def representative_data_gen(): """生成代表性数据集用于量化""" for i in range(100): yield [x_test[i:i+1].astype(np.float32)] # 6. 性能测试 def benchmark_tflite(model_path): """基准测试TFLite模型性能""" # 加载TFLite模型 interpreter = tf.lite.Interpreter(model_path=model_path) interpreter.allocate_tensors() # 获取输入输出信息 input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() # 准备测试数据 input_shape = input_details[0]['shape'] total_time = 0 correct = 0 # 运行100次推理 for i in range(100): input_data = x_test[i:i+1].astype(np.float32) interpreter.set_tensor(input_details[0]['index'], input_data) # 执行推理并计时 start = tf.timestamp() interpreter.invoke() end = tf.timestamp() total_time += (end - start).numpy() # 获取输出 output = interpreter.get_tensor(output_details[0]['index']) pred = np.argmax(output) if pred == y_test[i]: correct += 1 # 计算指标 accuracy = correct / 100 avg_latency = total_time / 100 * 1000 # 转换为毫秒 print(f"模型: {os.path.basename(model_path)}") print(f"准确率: {accuracy:.4f} | 平均延迟: {avg_latency:.2f} ms") return accuracy, avg_latency # 主优化流程 def full_optimization_pipeline(): # 创建基础模型 base_model = create_base_model() # 训练基础模型 base_model, base_acc, base_size = train_and_evaluate(base_model, "base_model") # 应用剪枝 pruned_model, pruning_callbacks = apply_pruning(create_base_model()) pruned_model, pruned_acc, pruned_size = train_and_evaluate( pruned_model, "pruned_model", callbacks=pruning_callbacks ) # 应用量化感知训练 qat_model = apply_quantization_aware_training(create_base_model()) qat_model, qat_acc, qat_size = train_and_evaluate(qat_model, "qat_model") # 应用权重聚类 clustered_model = apply_weight_clustering(create_base_model()) clustered_model, clustered_acc, clustered_size = train_and_evaluate( clustered_model, "clustered_model" ) # 组合优化:剪枝 + 量化 pruned_qat_model = apply_quantization_aware_training(pruned_model) pruned_qat_model, pruned_qat_acc, pruned_qat_size = train_and_evaluate( pruned_qat_model, "pruned_qat_model" ) # 转换为TFLite并量化 models = { "base": base_model, "pruned": pruned_model, "qat": qat_model, "clustered": clustered_model, "pruned_qat": pruned_qat_model } # 测试不同量化模式 results = [] for name, model in models.items(): # 默认量化 tflite_path, tflite_size = convert_to_tflite(model, name, 'dynamic_range') acc, latency = benchmark_tflite(tflite_path) results.append({ "name": name + "_dynamic", "accuracy": acc, "size_mb": tflite_size, "latency_ms": latency }) # float16量化 tflite_path, tflite_size = convert_to_tflite(model, name, 'float16') acc, latency = benchmark_tflite(tflite_path) results.append({ "name": name + "_float16", "accuracy": acc, "size_mb": tflite_size, "latency_ms": latency }) # 全整型量化 if name in ["qat", "pruned_qat"]: # 仅量化感知训练的模型支持全整型 tflite_path, tflite_size = convert_to_tflite(model, name, 'full_int8') acc, latency = benchmark_tflite(tflite_path) results.append({ "name": name + "_int8", "accuracy": acc, "size_mb": tflite_size, "latency_ms": latency }) return results # 运行优化流程 if __name__ == "__main__": optimization_results = full_optimization_pipeline() # 打印结果摘要 print("\n===== 优化结果摘要 =====") print(f"{'模型名称':<20}{'大小(MB)':<10}{'延迟(ms)':<12}{'准确率':<8}") for result in optimization_results: print(f"{result['name']:<20}{result['size_mb']:<10.2f}{result['latency_ms']:<12.2f}{result['accuracy']:<8.4f}") ``` ### 优化技术详解 #### 1. 剪枝(Pruning) ```python # 应用剪枝 pruning_params = { 'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay( initial_sparsity=0.30, final_sparsity=0.80, begin_step=0, end_step=1000 ) } model_pruned = tfmot.sparsity.keras.prune_low_magnitude(model, **pruning_params) ``` - **原理**:逐步移除对模型输出影响小的权重 - **优点**:减少模型大小,加速推理 - **实现**:使用`PolynomialDecay`计划逐渐增加稀疏度 #### 2. 量化感知训练(Quantization-Aware Training) ```python # 应用量化感知训练 quantize_model = tfmot.quantization.keras.quantize_model model_quantized = quantize_model(model) ``` - **原理**:在训练中模拟量化效果,让模型适应低精度 - **优点**:减少模型大小4倍,加速推理2-3倍 - **实现**:在训练中插入伪量化节点 #### 3. 权重聚类(Weight Clustering) ```python # 应用权重聚类 clustering_params = { 'number_of_clusters': 16, 'cluster_centroids_init': tfmot.clustering.keras.CentroidInitialization.KMEANS_PLUS_PLUS } clustered_model = tfmot.clustering.keras.cluster_weights(model, **clustering_params) ``` - **原理**:将权重分组到聚类中心,减少唯一权重值数量 - **优点**:显著减少模型大小,提高压缩率 - **实现**:使用k-means++算法初始化聚类中心 ### TFLite 量化技术 #### 1. 动态范围量化 ```python converter.optimizations = [tf.lite.Optimize.DEFAULT] ``` - **原理**:仅量化权重为int8,激活保持float32 - **优点**:减少模型大小4倍,精度损失小 #### 2. Float16量化 ```python converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.target_spec.supported_types = [tf.float16] ``` - **原理**:将权重转换为float16 - **优点**:减少模型大小2倍,支持GPU加速 #### 3. 全整型量化 ```python converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = lambda: representative_data_gen() ``` - **原理**:权重和激活都量化为int8 - **优点**:最大化性能和能效,适合低功耗设备 - **要求**:需要代表性数据集校准量化参数 ### 优化结果对比(示例) | 模型类型 | 大小(MB) | 延迟(ms) | 准确率 | |-----------------|----------|----------|--------| | base_dynamic | 1.85 | 12.4 | 0.978 | | base_float16 | 0.92 | 8.7 | 0.978 | | pruned_dynamic | 0.76 | 9.2 | 0.976 | | qat_int8 | 0.48 | 6.3 | 0.975 | | pruned_qat_int8 | 0.31 | 4.8 | 0.972 | ### 高级优化技巧 #### 1. 分层优化策略 ```python # 为不同层应用不同优化 def selective_optimization(model): # 只量化密集层 quantize_annotate_layer = tfmot.quantization.keras.quantize_annotate_layer quantize_annotate_model = tfmot.quantization.keras.quantize_annotate_model # 标注要量化的层 annotated_model = quantize_annotate_model(model) for layer in annotated_model.layers: if isinstance(layer, tf.keras.layers.Dense): annotated_model = quantize_annotate_layer(layer) # 应用量化 return tfmot.quantization.keras.quantize_apply(annotated_model) ``` #### 2. 稀疏聚类联合优化 ```python # 先剪枝后聚类 def combined_optimization(model): # 应用剪枝 pruned_model = apply_pruning(model) # 微调剪枝模型 pruned_model.fit(...) # 移除剪枝包装 stripped_pruned_model = tfmot.sparsity.keras.strip_pruning(pruned_model) # 应用聚类 clustered_model = apply_weight_clustering(stripped_pruned_model) return clustered_model ``` #### 3. 模型蒸馏 ```python def model_distillation(teacher_model, student_model): # 教师模型生成软标签 teacher_logits = teacher_model.predict(x_train) # 定义蒸馏损失 def distillation_loss(y_true, y_pred): # 真实标签损失 ce_loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred) # 教师-学生蒸馏损失 teacher_probs = tf.nn.softmax(teacher_logits / 2.0) student_probs = tf.nn.softmax(y_pred / 2.0) kl_loss = tf.keras.losses.kld(teacher_probs, student_probs) return 0.3 * ce_loss + 0.7 * kl_loss # 编译学生模型 student_model.compile( optimizer='adam', loss=distillation_loss ) return student_model ``` ### 生产环境部署建议 1. **移动端部署**: ```java // Android TFLite部署示例 Interpreter.Options options = new Interpreter.Options(); options.setUseNNAPI(true); // 启用硬件加速 Interpreter interpreter = new Interpreter(modelFile, options); interpreter.run(inputBuffer, outputBuffer); ``` 2. **Web部署**: ```html <!-- TensorFlow.js部署 --> <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script> <script> async function loadModel() { const model = await tf.loadGraphModel('model_web/model.json'); const input = tf.tensor4d(imageData, [1, 224, 224, 3]); const output = model.predict(input); const result = output.argMax(1).dataSync()[0]; } </script> ``` 3. **边缘设备部署**: ```cpp // Raspberry Pi C++部署 #include "tensorflow/lite/interpreter.h" #include "tensorflow/lite/model.h" std::unique_ptr<tflite::FlatBufferModel> model = tflite::FlatBufferModel::BuildFromFile("model.tflite"); tflite::ops::builtin::BuiltinOpResolver resolver; std::unique_ptr<tflite::Interpreter> interpreter; tflite::InterpreterBuilder(*model, resolver)(&interpreter); interpreter->AllocateTensors(); float* input = interpreter->typed_input_tensor<float>(0); // 填充输入数据... interpreter->Invoke(); float* output = interpreter->typed_output_tensor<float>(0); ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值