初识Caffe之数据读取

最新推荐文章于 2024-09-21 19:51:06 发布

perfects110

最新推荐文章于 2024-09-21 19:51:06 发布

阅读量1.3k

点赞数

CC 4.0 BY-SA版权

文章标签： C++ Caffe Linux

本文链接：https://blog.youkuaiyun.com/perfects110/article/details/81000256

Facebook推出的基于C++的Caffe框架，对数据块的加载采用了缓存的机制，当整个网络开始运行时，首先

会运行数据层，在该层中将数据分成多个batch，根据用户的需求修改数据层中参数prefetch

data_param{
  prefetch: 2
}

这里设置为2表示预先从磁盘中读取2个batch到内存中，形成一种缓存的机制。

具体源代码分析见下：

Caffe将读取的数据存储于以Batch类中

template <typename Dtype>
class Batch{
  public:
    Blob<Dtype> data_， label_;
};

Blob时Caffe数据存储的基类，data_表示要存储的数据信息，格式为numXCXHXW，label_表示数据对应的标签，格

式为num；

数据层运行时会运行LayerSetUp函数，该函数会调用internal_thread.cpp文件中的StartInternalThread函数，该函数中语句

try {
    thread_.reset(new boost::thread(&InternalThread::entry, this, device, mode,
          rand_seed, solver_count, solver_rank, multiprocess));
  } catch (std::exception& e) {
    LOG(FATAL) << "Thread exception: " << e.what();
  }

会通过c++11中的一个thread类来创建一个线程并运行，该线程调用了entry函数，entry函数会继续调用InternalThreadEntry函数，该函数中的语句

try {
    while (!must_stop()) {
      Batch<Dtype>* batch = prefetch_free_.pop();
      load_batch(batch);
#ifndef CPU_ONLY
      if (Caffe::mode() == Caffe::GPU) {
        batch->data_.data().get()->async_gpu_push(stream);
        if (this->output_labels_) {
          batch->label_.data().get()->async_gpu_push(stream);
        }
        CUDA_CHECK(cudaStreamSynchronize(stream));
      }
#endif
      prefetch_full_.push(batch);
    }
  } catch (boost::thread_interrupted&) {
    // Interrupted exception is expected on shutdown
  }

must_stop()是当整个程序运行结束后会发送中断请求来中断该线程，在程序中定义了三个用于缓存数据的阻塞队列变量

vector<shared_ptr<Batch<Dtype> > > prefetch_;
BlockingQueue<Batch<Dtype>*> prefetch_free_;
BlockingQueue<Batch<Dtype>*> prefetch_full_;

当执行到该段时，若prefetch_free_中还有空间，就会取出一个空间用于load_batch函数中数据的加载，并推入prefetch_full_中，若prefetch_free_中没有空闲位置，则会通知线程进入等待状态。

上面这部分主要是缓存机制，当将缓存数据读入网络中时，会调用函数Forward,其中代码块

if (prefetch_current_) {
    prefetch_free_.push(prefetch_current_);
  }
  prefetch_current_ = prefetch_full_.pop("Waiting for data");

prefetch_current_即是将输入网络中的数据。