Method
PyCaffe中的传播函数(以正向传播为例)原型:
def _Net_forward(self, blobs=None, start=None, end=None, **kwargs):
"""
Forward pass: prepare inputs and run the net forward.
Parameters
----------
blobs : list of blobs to return in addition to output blobs.
kwargs : Keys are input blob names and values are blob ndarrays.
For formatting inputs for Caffe, see Net.preprocess().
If None, input is taken from data layers.
start : optional name of layer at which to begin the forward pass
end : optional name of layer at which to finish the forward pass
(inclusive)
Returns
-------
outs : {blob name: blob ndarray} dict.
"""
我们可以通过start或end参数控制传播的起始或结束层,通过blobs参数给定输出Blobs列表(list),也可以通过Caffe.Net对象直接访问某个Blob:
net.blobs['BLOB_NAME'].data[...] = ...
上述方法可以对Blob进行读取和修改。原因是它使用了Blob类的mutable_cpu_data()
函数获取数据指针,它会同时将SyncedMemory对象的head_成员改为HEAD_AT_CPU
,在接下来执行相应Layer的传播函数时会自动进行CPU和GPU间的数据同步。
Problems
返回值Blob不存在
PyCaffe中_Net_forward()
函数设定end参数时,会自动查找和end层同名的Blob并尝试返回,如果end层不存在同名Blob,须删除
outputs = set([end] + blobs)
中的[end]
修改Blob后仅能从第一个引用该Blob的Layer开始传播
表现
在Faster R-CNN中主要表现为修改Blob['conv5']
后,仅能从relu5
层开始正向传播,而不能从rpn_conv/3x3
或roi_pool_conv5
层开始正向传播。
分析
Net类中所有Blob通过智能指针(shared_ptr< Blob< Dtype> >)进行管理,所有Blob的智能指针存储在blobs_
向量中。同时每个Layer还有自己的输入/输出Blob指针(Blob< Dtype > *)向量,便于快速找到每个Layer的Bottom / Top Blobs。
Net类的Init()
函数负责初始化网络,建立每一个Layer的同时,它会调用AppendBottom()
函数和AppendTop()
函数维护每个Layer的输入输出Blob指针(Blob< Dtype > *)向量。
AppendTop()
函数会为每个非原地计算的Layer申请新的Blob空间(shared_ptr< Blob< Dtype> >),维护本Layer的Top Blob指针向量,并将尚未连接的Top Blobs记录在available_blobs集合中。
AppendBottom()
函数会检查每个Bottom Blob是否在available_blobs集合中,在则更新本Layer的Bottom Blob指针向量,不在则引发Fatal Error。
可以看到Net类要求每个Layer的Bottom Blob仅能供本Layer使用,当我们在prototxt中定义了一个被多个Layer使用的Blob时,Caffe会自动添加一个Split Layer(caffe/util/insert_splits.hpp
),将该Blob复制为多个副本(不同的Blob对象,正向传播数据共享SyncedMemory,反向传播误差独享SyncedMemory),并分别送入每个引用它的Layer。而我们在PyCaffe中仅能修改split操作前的Blob,当Blob大小发生变化时,其副本的SyncedMemory将不正确。
解决
解决方法有很多种,比较容易想到的有:
- 手工添加Split Layer,避免Caffe自动Split
- 通过glog输出信息,确定自动split后的Blob Name或ID,在PyCaffe中修改相应的Blobs。
- 修改Net类的
ForwardFromTo(int start, int end)
函数,在该函数中完成Blob的复制,无需添加Split Layer。
关于shared_ptr
C++11标准库< memory >中提供了shared_ptr智能指针,可以提供有限的垃圾回收功能。参考C++ Reference:
Manages the storage of a pointer, providing a limited garbage-collection facility, possibly sharing that management with other objects.
Objects of shared_ptr types have the ability of taking ownership of a pointer and share that ownership: once they take ownership, the group of owners of a pointer become responsible for its deletion when the last one of them releases that ownership.
shared_ptr objects release ownership on the object they co-own as soon as they themselves are destroyed, or as soon as their value changes either by an assignment operation or by an explicit call to shared_ptr::reset. Once all shared_ptr objects that share ownership over a pointer have released this ownership, the managed object is deleted (normally by calling ::delete, but a different deleter may be specified on construction).
shared_ptr objects can only share ownership by copying their value: If two shared_ptr are constructed (or made) from the same (non-shared_ptr) pointer, they will both be owning the pointer without sharing it, causing potential access problems when one of them releases it (deleting its managed object) and leaving the other pointing to an invalid location.
Additionally, shared_ptr objects can share ownership over a pointer while at the same time pointing to another object. This ability is known as aliasing (see constructors), and is commonly used to point to member objects while owning the object they belong to. Because of this, a shared_ptr may relate to two pointers:
- A stored pointer, which is the pointer it is said to point to, and the one it dereferences with operator*.
- An owned pointer (possibly shared), which is the pointer the ownership group is in charge of deleting at some point, and for which it counts as a use.
Generally, the stored pointer and the owned pointer refer to the same object, but alias shared_ptr objects (those constructed with the alias constructor and their copies) may refer to different objects.
A shared_ptr that does not own any pointer is called an empty shared_ptr. A shared_ptr that points to no object is called a null shared_ptr and shall not be dereferenced. Notice though that an empty shared_ptr is not necessarily a null shared_ptr, and a null shared_ptr is not necessarily an empty shared_ptr.
shared_ptr objects replicate a limited pointer functionality by providing access to the object they point to through operators * and ->. For safety reasons, they do not support pointer arithmetics.
A related class, weak_ptr, is able to share pointers with shared_ptr objects without owning them.
在Caffe中,所有Blob通过shared_ptr< Blob< Dtype> >初始化,随后通过get()
函数获取其指针,存入bottom_vecs_和top_vecs_向量中,供每个Layer的Forward/Backward函数访问。shared_ptr仅起到了在程序结束后自动析构的作用。