mshadow简介

最新推荐文章于 2025-06-27 11:47:16 发布

翻译最新推荐文章于 2025-06-27 11:47:16 发布 · 1.3k 阅读

文章标签：

#mshadow #mxnet #gpu/cpu

深度学习同时被 2 个专栏收录

49 篇文章

订阅专栏

mxnet

9 篇文章

订阅专栏

Tensor Data Structure

mshadow基础的数据结构是Tensor. 以下是mashadow/tensor.h的简化版本.

  typedef unsigned index_t;
template<int dimension>
struct Shape {
  index_t shape_[dimension];
};
template<typename Device, int dimension, typename DType = float>
struct Tensor {
  DType *dptr_;
  Shape<dimension> shape_;
  Stream<Device> stream_;
  index_t stride_;
};
// this is how shape object declaration look like
Shape<2> shape2;
// this is how tensor object declaration look like
// you can
Tensor<cpu, 2> ts2;
Tensor<gpu, 3, float> ts3;

Tensor

struct Shape<2> {
  index_t shape_[2];
};
struct Tensor<cpu, 2, float> {
  float *dptr_;
  Shape<2> shape_;
  index_t stride_;
};

Tensor

float data[9] = {0, 1, 2, 3, 4, 5, 6, 7, 8};
Tensor<cpu, 2> ts;
ts.dptr_ = data;
ts.shape_ = mshadow::Shape2(3, 2);
ts.stride_ = 3;
// now: ts[0][0] == 0, ts[0][1] == 1 , ts[1][0] == 3, ts[1][1] == 4
for (index_t i = 0; i < ts.size(0); ++i) {
  for (index_t j = 0; j < ts.size(1); ++j) {
    printf("ts[%u][%u]=%f\n", i, j, ts[i][j]);
  }
}

结果是一个3*2的矩阵,其中data[2],data[5],data[8]被当作padding被忽略了(译注:值还在,只是没法通过ts访问到了,而且ts[0]和ts[1]是不连续的内存了). 如果希望获得连续的内存, 需要把stride_设置成shape_[1] (译注:因为依然是2*3的矩阵,data[2],data[5],data[8]依然无法通过ts访问)
NOTICE: We highly recommend use stream in gpu mode, there will be an error thrown out if no stream is set. Check basic_stream.cu for more detail.

Memory Allocation

mshadow内一个重要的设计是只要设置好dptr_, shape_和stride_, 就可以把tensor作为whitebox(译注:不是blakcbox??):
* Tensor

// create a 5 x 3 tensor on the device, and allocate space
Tensor<gpu, 2> ts2(Shape2(5, 3));
AllocSpace(&ts2);
// allocate 5 x 3 x 2 tensor on the host, initialized by 0
Tensor<cpu, 3> ts3 = NewTensor<cpu>(Shape3(5,3,2), 0.0f);
// free space
FreeSpace(&ts2); FreeSpace(&ts3);

所有内存分配都是显示的,任何操作过程中都不存在隐式内存分配和释放. 这意味着Tensor

Elementwise Operations

mshadow中所有操作符(比如+,-,*,/,+=)都是元素级别的. 考虑如下SGD中权重更新代码:

void UpdateSGD(Tensor<cpu, 2> weight, Tensor<cpu, 2> grad, float eta, float lambda) {
  weight -= eta * (grad + lambda * weight);
}

编译阶段,这段代码会转换成如下形式:

void UpdateSGD(Tensor<cpu,2> weight, Tensor<cpu,2> grad, float eta, float lambda) {
  for (index_t y = 0; y < weight.size(0); ++y) {
    for (index_t x = 0; x < weight.size(1); ++x) {
      weight[y][x] -= eta * (grad[y][x] + lambda * weight[y][x]);
    }
  }
}

如上所示, 转换后的代码没有内存分配操作. 对于Tensor

One code for both CPU and GPU

因为Tensor

template<typename xpu>
void UpdateSGD(Tensor<xpu, 2> weight, const Tensor<xpu, 2> &grad,
               float eta, float lambda) {
  weight -= eta * (grad + lambda * weight);
}

Matrix Multiplications

我们可以实现一段简单的代码,实现矩阵点乘, 编译阶段将被转换成对MKL和cuBLAS这里标准库的调用:

template<typename xpu>
void Backprop(Tensor<xpu, 2> gradin,
              const Tensor<xpu, 2> &gradout,
              const Tensor<xpu, 2> &netweight) {
  gradin = dot(gradout, netweight.T());
}

这段代码编译后既可以在cpu上运行,也可以在gpu上运行

User Define Operator

应用中我们很可能会自定义函数. 例如mshadow里没有元素级别的sigmoid变化,我们可以通过如下代码增加这个功能:

struct sigmoid {
  MSHADOW_XINLINE static float Map(float a) {
    return 1.0f / (1.0f + expf(-a));
  }
};
template<typename xpu>
void ExampleSigmoid(Tensor<xpu, 2> out, const Tensor<xpu, 2> &in) {
  out = F<sigmoid>(in * 2.0f) + 1.0f;
}

转换后的CPU代码如下所示:

template<typename xpu>
void ExampleSigmoid(Tensor<xpu, 2> out, const Tensor<xpu, 2> &in) {
  for (index_t y = 0; y < out.size(0); ++y) {
    for(index_t x = 0; x < out.size(1); ++x) {
      out[y][x] = sigmoid::Map(in[y][x] * 2.0f) + 1.0f;
    }
  }
}

操作可以使用表达式的组合. 我们不仅可以 out = F<sigmoid>(in), 也可以 out = F<sigmoid>+2.0
或者 out = F<sigmoid>(F<sigmoid>(in))
这段代码同样可以转换成CUDA kernel的版本,在GPU上运行,参考defop.cpp.

Complete Example

以下的代码来自basic.cpp,展示了mshadow的常见用法.

// header file to use mshadow
#include "mshadow/tensor.h"
// this namespace contains all data structures, functions
using namespace mshadow;
// this namespace contains all operator overloads
using namespace mshadow::expr;

int main(void) {
  // intialize tensor engine before using tensor operation, needed for CuBLAS
  InitTensorEngine<cpu>();
  // assume we have a float space
  float data[20];
  // create a 2 x 5 x 2 tensor, from existing space
  Tensor<cpu, 3> ts(data, Shape3(2,5,2));
    // take first subscript of the tensor
  Tensor<cpu, 2> mat = ts[0];
  // Tensor object is only a handle, assignment means they have same data content
  // we can specify content type of a Tensor, if not specified, it is float bydefault
  Tensor<cpu, 2, float> mat2 = mat;

  // shape of matrix, note size order is the same as numpy
  printf("%u X %u matrix\n", mat.size(0), mat.size(1));

  // initialize all element to zero
  mat = 0.0f;
  // assign some values
  mat[0][1] = 1.0f; mat[1][0] = 2.0f;
  // elementwise operations
  mat += (mat + 10.0f) / 10.0f + 2.0f;

  // print out matrix, note: mat2 and mat1 are handles(pointers)
  for (index_t i = 0; i < mat.size(0); ++i) {
    for (index_t j = 0; j < mat.size(1); ++j) {
      printf("%.2f ", mat2[i][j]);
    }
    printf("\n");
  }
  // shutdown tensor enigne after usage
  ShutdownTensorEngine<cpu>();
  return 0;
}