在tensorflow中使用增量编译添加OP（不必编译tensorflow）

最新推荐文章于 2025-03-12 23:56:40 发布

Hydrion

最新推荐文章于 2025-03-12 23:56:40 发布

阅读量2.1k

点赞数 1

分类专栏：问题解决 python tensorflow c++

本文链接：https://blog.youkuaiyun.com/u012614287/article/details/90415545

版权

问题解决同时被 3 个专栏收录

12 篇文章

订阅专栏

python

11 篇文章

订阅专栏

tensorflow

2 篇文章

订阅专栏

本文介绍在TensorFlow中添加OP的方法。虽社区教程未提及如何将编写的OP加入框架，但亲测可通过增量编译C++和CUDA程序注册OP。以CPU实现为例，总体分三步：编写cpp文件，用g++编译生成*.so文件，在TF中导入并调用OP，还给出了各步骤的具体操作。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

关于在tensorflow(下文称tf)中添加op，tf中文社区中有相应教程：http://www.tensorfly.cn/tfdoc/how_tos/adding_an_op.html 但该教程未提及如何将编写的op加入tensorflow框架。
亲测不必编译tensorflow源码，直接通过增量编译C++和CUDA程序可以将op注册在tensorflow中。本文以用CPU实现op为例，记录在tensorflow编写、注册和调用op的过程。

总体过程可分为以下三步：

编写cpp文件（若使用GPU，则还需要编写cu文件）。
使用g++编译cpp文件，生成*.so文件（若使用GPU，需要在编写cpp文件之前使用nvcc命令编译*.cu文件，生成*.cu.o文件）。
在tf中通过*.so导入op，调用op即可。

一、编写cpp文件。
tf中注册op的文件主要由三部分组成——声明op的输入输出及其他属性、实现op、注册op，典型示例代码如下：
新建文件my_add.cc，将以下代码复制进该文件。

#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/shape_inference.h"

using namespace tensorflow;
//声明op输入输出等属性
REGISTER_OP("MyAdd")
    .Input("x: int32")
    .Input("y: int32")
    .Output("z: int32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
      c->set_output(0, c->input(0));
      c->set_output(0, c->input(1));
      return Status::OK();
    });

//实现op
#include "tensorflow/core/framework/op_kernel.h"

using namespace tensorflow;

class MyAddOp : public OpKernel {
 public:
  explicit MyAddOp(OpKernelConstruction* context) : OpKernel(context) {}

  void Compute(OpKernelContext* context) override {
    // Grab the input tensor
    const Tensor& a = context->input(0);
    const Tensor& b = context->input(1);
    auto A = a.flat<int32>();
    auto B = b.flat<int32>();
    // Create an output tensor
    Tensor* output_tensor = NULL;
    OP_REQUIRES_OK(context, context->allocate_output(0, a.shape(),
                                                     &output_tensor));
    auto output_flat = output_tensor->flat<int32>();

    // Set all but the first element of the output tensor to 0.
    const int N = A.size();

    for (int i = 1; i < N; i++) {
      output_flat(i) = A(i)+B(i);
    }
    output_flat(0) = 0;
  }
};

//注册op
REGISTER_KERNEL_BUILDER(Name("MyAdd").Device(DEVICE_CPU), MyAddOp);

二、使用g++编译，生成so文件。
主机的g++版本为5.4.0，使用以下命令编译该op

TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
g++ -std=c++11 -shared my_add.cc -o my_add.so -fPIC -D_GLIBCXX_USE_CXX11_ABI=0 ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2

若主机的g++版本号小于5，可将最后一行命令改为

g++ -std=c++11 -shared my_add.cc -o my_add.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2

执行以上命令后，在当前目录下产生了my_add.so文件。

三、将op文件导入，调用之前编写的cpp文件。
使用tf.load_op_library函数载入so文件，即可调用原先编写的op。

import tensorflow as tf
so_file = 'your_add_so_file_path/my_add.so'

if __name__ == "__main__":
  #tf.test.main()
  my_add_module = tf.load_op_library(so_file) 
  out = my_add_module.my_add([5, 4, 3, 2, 1],[1, 2, 3, 4, 5])
  sess = tf.Session()
  result = sess.run(out)
  print(result)
  #output [0, 6, 6, 6, 6]

注意：注册时tf会自动生成python_wrapper，因此在tf的python环境中调用原先op的函数名或许与之前在cpp文件中定义的函数名有所不同。在python中可以通过dir(package_name)来列出该模块中所有函数的方法查看该op的python_wrapper。