CNN笔记(CS231N)——深度学习软件(Deep Learning Software)

本文对比分析了TensorFlow、PyTorch和Caffe等主流深度学习框架的特点与应用,包括它们的计算图概念、自动求导机制、模型训练流程及预训练模型的使用,旨在帮助读者理解各框架的优势及适用场景。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

深度学习框架

我们在使用CNN的时候往往会采用深度学习框架来减小我们的工作量,以下是现在常用的一些深度学习框架

深度学习框架有以下意义

下面让我们来看看如果我们用numpy从头构建一个计算图是怎么样的,我们可以看出它有以下两个问题:不能在GPU上运行以及梯度需要我们手动计算

我们采用深度学习的框架可以很好的解决以上问题:让程序在GPU上运行仅仅需要一行代码,以及程序可以自动为我们计算梯度

TensorFlow

下面让我们来看一下tf的具体细节。我们看下图的程序,在tf.session()之前的代码是定义计算图,在这部分不进行任何的实际计算,在tf.session()之后的代码是给计算图赋予输入,让它运行起来并计算梯度。我们可以看出来在计算图的定义过程中首先为每个变量定义了一个placeholders,之后定义了计算图的结构以及代价函数。在计算的过程中,我们创建了很多numpy array来填充placeholders,之后运行计算图

为了更新权重,我们还需要增加几行代码。但是这种方法有个问题,就是计算图是在GPU上运行的,而我们的更新权重是在CPU上运行的,因此每次我们都需要将权重从CPU搬到GPU中,大大降低了程序运行的速度

为了解决这个问题我们可以将权重定义为variale来使得权重能够在计算图中保持一致。由于权重变为了存在于计算图中,因此我们每次我们需要把更新权重的代码放到前面的计算图定义中

但是我们如果采用如上代码会发现代价函数值在运行的时候完全没有降低,原因是我们虽然定义了更新权重的代码,但是在运行的时候完全没有运行它。解决方案就是我们加一行完全没作用的代码将更新权重的代码包含进来,然后在输出的时候输出这个值,这样在运行的时候我们就会运行权重更新的代码

另外在tf中我们有很多高阶API让我们无需手动实现loss、权重更新、卷积层定义以及初始化的过程,这种方法往往更简单,也拥有更好的效果。

另外我们还有Keras这种高阶封装器将以上过程封装起来进一步简化上述过程

除了Keras还有一些其他的封装器供我们使用

另外TensorFlow还有一些预训练模型、可视化工具供我们使用

PyTorch

除了TensorFlow,还有一种常用的框架是PyTorch。我们可以将PyTorch中一些常用概念与TF中的概念进行类比

PyTorch的Tensors与numpy array类似,以下是用Tensors构建的神经网络,我们可以看出来这种网络也是需要我们手动计算梯度

再往下引入计算图概念以后,我们将变量封装在Variable中,然后就可以用autograd来自动计算梯度了,Tensors 和Variables拥有相同的API。我们在声明Variable的时候就要说明是否需要计算它的梯度

另外我们还可以定义自己的autograd函数来计算梯度

与TF中高阶封装器类似,PyTorch中也有nn来简化我们的工作

4

除此以外PyTorch中也有optimizer来简化权重更新

除了使用自带的model,我们还可以自己定义model,我们在定义model的时候不需要定义backward,因为autograd可以自动处理求梯度的过程

PyTorch中dataloaders来供我们导入数据。我们从dataloaders中得到的数据是Tensors格式的,若要用到神经网络中我们应将其变为Variable

另外PyTorch中也提供预训练模型、可视化工具等

我们可以将TensorFlow和PyTorch进行如下对比

静态图vs动态图

TensorFlow和PyTorch的另外一个区别就是TensorFlow采用的是静态图而PyTorch采用的是动态图

静态图的好处是我们预先定义计算图以后,框架能对其结构进行优化

另外,我们在构建完计算图以后我们可以对其进行序列化,使其可以独立于代码运行

而动态图的优点是其非常好实现条件判断、循环等操作。这些操作在静态图中需要转换成专门的TF控制流操作符才能实现


 

现在TensorFlow也提出了自己的动态图实现方法,但是还是没PyTorch效果好

以下是动态图的一些应用场景

Caffe

Caffe在科研的时候用的相对较少,它让你基本不需要写代码就能实现一个神经网络,但相对的它让你很难对网络细节进行更改

以下是使用Caffe的基本步骤

第一步是转换数据格式

第二步是利用protext定义网络结构

对于非常大的网络利用Caffe来实现就会变得非常麻烦

第三步是定义Solver来定义网络中的参数

第四步是训练

Caffe有预训练模型与Python界面供我们使用

 

以下是Caffe的一些优缺点

Caffe2是最新的版本,对Caffe做了很多优化

我们可以看出google与facebook的思路完全不同。google希望能提出一个框架来满足所有应用场景,而facebook提出了两个框架来满足生产场景与研究场景的不同需求

以下是讲者对于选择不同神经网络框架的建议

 

 

深度学习工具包 Deprecation notice. ----- This toolbox is outdated and no longer maintained. There are much better tools available for deep learning than this toolbox, e.g. [Theano](http://deeplearning.net/software/theano/), [torch](http://torch.ch/) or [tensorflow](http://www.tensorflow.org/) I would suggest you use one of the tools mentioned above rather than use this toolbox. Best, Rasmus. DeepLearnToolbox ================ A Matlab toolbox for Deep Learning. Deep Learning is a new subfield of machine learning that focuses on learning deep hierarchical models of data. It is inspired by the human brain's apparent deep (layered, hierarchical) architecture. A good overview of the theory of Deep Learning theory is [Learning Deep Architectures for AI](http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf) For a more informal introduction, see the following videos by Geoffrey Hinton and Andrew Ng. * [The Next Generation of Neural Networks](http://www.youtube.com/watch?v=AyzOUbkUf3M) (Hinton, 2007) * [Recent Developments in Deep Learning](http://www.youtube.com/watch?v=VdIURAu1-aU) (Hinton, 2010) * [Unsupervised Feature Learning and Deep Learning](http://www.youtube.com/watch?v=ZmNOAtZIgIk) (Ng, 2011) If you use this toolbox in your research please cite [Prediction as a candidate for learning deep hierarchical models of data](http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6284) ``` @MASTERSTHESIS\{IMM2012-06284, author = "R. B. Palm", title = "Prediction as a candidate for learning deep hierarchical models of data", year = "2012", } ``` Contact: rasmusbergpalm at gmail dot com Directories included in the toolbox ----------------------------------- `NN/` - A library for Feedforward Backpropagation Neural Networks `CNN/` - A library for Convolutional Neural Networks `DBN/` - A library for Deep Belief Networks `SAE/` - A library for Stacked Auto-Encoders `CAE/` - A library for Convolutional Auto-Encoders `util/` - Utility functions used by the libraries `data/` - Data used by the examples `tests/` - unit tests to verify toolbox is working For references on each library check REFS.md Setup ----- 1. Download. 2. addpath(genpath('DeepLearnToolbox')); Example: Deep Belief Network --------------------- ```matlab function test_example_DBN load mnist_uint8; train_x = double(train_x) / 255; test_x = double(test_x) / 255; train_y = double(train_y); test_y = double(test_y); %% ex1 train a 100 hidden unit RBM and visualize its weights rand('state',0) dbn.sizes = [100]; opts.numepochs = 1; opts.batchsize = 100; opts.momentum = 0; opts.alpha = 1; dbn = dbnsetup(dbn, train_x, opts); dbn = dbntrain(dbn, train_x, opts); figure; visualize(dbn.rbm{1}.W'); % Visualize the RBM weights %% ex2 train a 100-100 hidden unit DBN and use its weights to initialize a NN rand('state',0) %train dbn dbn.sizes = [100 100]; opts.numepochs = 1; opts.batchsize = 100; opts.momentum = 0; opts.alpha = 1; dbn = dbnsetup(dbn, train_x, opts); dbn = dbntrain(dbn, train_x, opts); %unfold dbn to nn nn = dbnunfoldtonn(dbn, 10); nn.activation_function = 'sigm'; %train nn opts.numepochs = 1; opts.batchsize = 100; nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.10, 'Too big error'); ``` Example: Stacked Auto-Encoders --------------------- ```matlab function test_example_SAE load mnist_uint8; train_x = double(train_x)/255; test_x = double(test_x)/255; train_y = double(train_y); test_y = double(test_y); %% ex1 train a 100 hidden unit SDAE and use it to initialize a FFNN % Setup and train a stacked denoising autoencoder (SDAE) rand('state',0) sae = saesetup([784 100]); sae.ae{1}.activation_function = 'sigm'; sae.ae{1}.learningRate = 1; sae.ae{1}.inputZeroMaskedFraction = 0.5; opts.numepochs = 1; opts.batchsize = 100; sae = saetrain(sae, train_x, opts); visualize(sae.ae{1}.W{1}(:,2:end)') % Use the SDAE to initialize a FFNN nn = nnsetup([784 100 10]); nn.activation_function = 'sigm'; nn.learningRate = 1; nn.W{1} = sae.ae{1}.W{1}; % Train the FFNN opts.numepochs = 1; opts.batchsize = 100; nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.16, 'Too big error'); ``` Example: Convolutional Neural Nets --------------------- ```matlab function test_example_CNN load mnist_uint8; train_x = double(reshape(train_x',28,28,60000))/255; test_x = double(reshape(test_x',28,28,10000))/255; train_y = double(train_y'); test_y = double(test_y'); %% ex1 Train a 6c-2s-12c-2s Convolutional neural network %will run 1 epoch in about 200 second and get around 11% error. %With 100 epochs you'll get around 1.2% error rand('state',0) cnn.layers = { struct('type', 'i') %input layer struct('type', 'c', 'outputmaps', 6, 'kernelsize', 5) %convolution layer struct('type', 's', 'scale', 2) %sub sampling layer struct('type', 'c', 'outputmaps', 12, 'kernelsize', 5) %convolution layer struct('type', 's', 'scale', 2) %subsampling layer }; cnn = cnnsetup(cnn, train_x, train_y); opts.alpha = 1; opts.batchsize = 50; opts.numepochs = 1; cnn = cnntrain(cnn, train_x, train_y, opts); [er, bad] = cnntest(cnn, test_x, test_y); %plot mean squared error figure; plot(cnn.rL); assert(er<0.12, 'Too big error'); ``` Example: Neural Networks --------------------- ```matlab function test_example_NN load mnist_uint8; train_x = double(train_x) / 255; test_x = double(test_x) / 255; train_y = double(train_y); test_y = double(test_y); % normalize [train_x, mu, sigma] = zscore(train_x); test_x = normalize(test_x, mu, sigma); %% ex1 vanilla neural net rand('state',0) nn = nnsetup([784 100 10]); opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples [nn, L] = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.08, 'Too big error'); %% ex2 neural net with L2 weight decay rand('state',0) nn = nnsetup([784 100 10]); nn.weightPenaltyL2 = 1e-4; % L2 weight decay opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex3 neural net with dropout rand('state',0) nn = nnsetup([784 100 10]); nn.dropoutFraction = 0.5; % Dropout fraction opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex4 neural net with sigmoid activation function rand('state',0) nn = nnsetup([784 100 10]); nn.activation_function = 'sigm'; % Sigmoid activation function nn.learningRate = 1; % Sigm require a lower learning rate opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex5 plotting functionality rand('state',0) nn = nnsetup([784 20 10]); opts.numepochs = 5; % Number of full sweeps through data nn.output = 'softmax'; % use softmax output opts.batchsize = 1000; % Take a mean gradient step over this many samples opts.plot = 1; % enable plotting nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex6 neural net with sigmoid activation and plotting of validation and training error % split training data into training and validation data vx = train_x(1:10000,:); tx = train_x(10001:end,:); vy = train_y(1:10000,:); ty = train_y(10001:end,:); rand('state',0) nn = nnsetup([784 20 10]); nn.output = 'softmax'; % use softmax output opts.numepochs = 5; % Number of full sweeps through data opts.batchsize = 1000; % Take a mean gradient step over this many samples opts.plot = 1; % enable plotting nn = nntrain(nn, tx, ty, opts, vx, vy); % nntrain takes validation set as last two arguments (optionally) [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); ``` [![Bitdeli Badge](https://d2weczhvl823v0.cloudfront.net/rasmusbergpalm/deeplearntoolbox/trend.png)](https://bitdeli.com/free "Bitdeli Badge")
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值