利用Miniflow创建一个深度神经网络

最新推荐文章于 2024-11-20 14:32:21 发布

原创最新推荐文章于 2024-11-20 14:32:21 发布 · 1.3k 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#神经网络 #miniflow

本文详细介绍了如何利用Miniflow构建深度神经网络，包括Node类及其子类input、linear、sigmoid和MSE的实现，以及前向传播、反向传播和梯度下降的过程。通过nn.py中的随机梯度下降步骤，展示了神经网络的训练过程，并强调了前向传播和反向传播在深度学习中的关键作用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

利用Miniflow创建一个深度神经网络

利用Miniflow可以更好的去理解Tensorflow的工作原理。

miniflow.py

import numpy as np

class Node(object):
    def __init__(self,inbound_nodes=[]):    #节点中包含输入节点、输出节点、值、梯度值以及前向传播和背向传播的方法
        self.inbound_nodes=inbound_nodes    #输入节点被存储为节点的列表
        self.value=None                     #节点取值为任意数据类型
        self.gradients={}                   #节点的梯度值为字典类型，键为每个输出节点，值为相应的输出节点关于该节点的偏微分乘以该节点相对于
                                            #输入节点的偏微分
        for n in self.inbound_nodes:        #输出节点被存储为节点的列表，由其传出节点进行赋值
            n.outbound_nodes.append(self)   

    def forward(self):
        raise NotImplementedError

    def backward(self):
        raise NotImplementedError

Node类是一个母类，其子类有input，linear，sigmoid和MSE四个。Node类中包含两个列表，分别用来存储输入节点和输出节点，包含一个字典，用来存储该节点的梯度值，包含一个取值，以及两个方法，用来进行前向传播和背向传播。

class input(Node):                      #输入节点类型
    def __init__(self):
        Node.__init__(self)

    def forward(self):
        pass

    def backward(self):
        self.gradients={self:0}
        for n in self.outbound_nodes:
            grad_cost=n.gradients[self]
            self.gradients[self]+=grad_cost*1

input类是Node类的一个子类，可以用来概括深度神经网络中的所有输入值类型：X,W,b，y。

class Linear(Node):
    """
    Represents a node that performs a linear transform.
    """
    def __init__(self, X, W, b):
        # The base class (Node) constructor. Weights and bias
        # are treated like inbound nodes.
        Node.__init__(self, [X, W, b])

    def forward(self):
        """
        Performs the math behind a linear transform.
        """
        X = self.inbound_nodes[0].value
        W = self.inbound_nodes[1].value
        b = self.inbound_nodes[2].value
        self.value = np.dot(X, W) + b

    def backward(self):
        """
        Calculates the gradient based on the output values.
        """
        # Initialize a partial for each of the inbound_nodes.
        self.gradients = {n: np.zeros_like(n.value) for n in self.inbound_nodes}
        # Cycle through the outputs. The gradient will change depending
        # on each output, so the gradients are summed over all outputs.
        for n in self.outbound_nodes:
            # Get the partial of the cost with respect to this node.
            grad_cost = n.gradients[self]
            # Set the partial of the loss with respect to this node's inputs.
            self.gradients[self.inbound_nodes[0]] += np.dot(grad_cost, self.inbound_nodes[1].value.T)
            # Set the partial of the loss with respect to this node's weights.
            self.gradients[self.inbound_nodes[1]] += np.dot(self.inbound_nodes[0].value.T, grad_cost)
            # Set the partial of the loss with respect to this node's bias.
            self.gradients[self.inbound_nodes[2]] += np.sum(grad_cost, axis=0, keepdims=False)  #axis=0表示对列求和

linear类是Node类的一个子类，可以用来概括深度神经网络中的所有的隐藏层中的线性叠加部分。其值由前一层节点的输出和输入的权重和偏置节点的值线性叠加而得。其梯度值的计算需要注意矩阵的shape匹配。

class sigmoid(Node):               #激活函数节点类型
    def __init__(self):
        Node.__init__(self)

    def _sigmoid(x):
        return (1./(1+np.exp(-x)))

    def forward(self):
        self.value=self._sigmoid(self.inbound_nodes[0].value)

    def backward(self):
        grad_cost=self.outbound_nodes[0].gradients[self]
        self.gradients=grad_cost*self.value*(1-self.value)

sigmoid类是Node类的一个子类，可以用来概括深度神经网络中的所有隐藏层中的非线性函数。由于其前后只有一个节点，所以其梯度值的计算比较简单，关键是激活函数的求导。

class MSE(Node):                    #Loss函数节点类型
    def __init__(self):
        Node.__init__(self)

    def forward(self):
        m=self.inbound_nodes[1].shape[0]
        y=self.inbound_nodes[1].value.reshape(-1,1)
        a=self.inbound_nodes[0].value.reshape(-1,1)
        diff=y-a
        self.value=1./m*np.dot(diff.T,diff)

    def backward(self):
        self.gradients[self.inbound_nodes[0]]=-2./m*diff
        self.gradients[self.inbound_nodes[1]]=2./m*diff

MSE类是Node类的一个子类，可以用来概括深度神经网络中的loss函数。其作为前向传播的终节点，也是背向传播的起始点。对应深度神经网络这个复合函数的最外层函数。

def forward_and_backward(graph):
    def forward(graph):
        for n in graph:
            n.forward

    def backward(graph):
        for n in graph:
            n.backward

forward_and_backward函数是用来调用各节点中的forward函数和backward函数，用以推进神经网络的运行。实际上外部命令也只能调用节点类的forward和backward函数，节点中的初始化函数理论上是无法通过外部命令调用的，因为其函数名前有两个下划线，用来保持类的私有性。但是节点内部还是可以调用的。

def topological_sort(feed_dict):
    """
    Sort the nodes in topological order using Kahn's Algorithm.

    `feed_dict`: A dictionary where the key is a `Input` Node and the value is the respective value feed to that Node.

    Returns a list of sorted nodes.
    """

    input_nodes = [n for n in feed_dict.keys()]

    G = {}
    nodes = [n for n in input_nodes]
    while len(nodes) > 0:
        n = nodes.pop(0)
        if n not in G:
            G[n] = {'in': set(), 'out': set()}
        for m in n.outbound_nodes:
            if m not in G:
                G[m] = {'in': set(), 'out': set()}
            G[n]['out'].add(m)
            G[m]['in'].add(n)
            nodes.append(m)

    L = []
    S = set(input_nodes)
    while len(S) > 0:
        n = S.pop()

        if isinstance(n, Input):
            n.value = feed_dict[n]

        L.append(n)
        for m in n.outbound_nodes:
            G[n]['out'].remove(m)
            G[m]['in'].remove(n)
            # if no other incoming edges add to S
            if len(G[m]['in']) == 0:
                S.add(m)
    return L

topological_sort函数是用来将所有节点构成的由向图进行拓扑排序，以形成一个前向传播的神经网络。输入值为输入节点的字典，字典的键为节点名，对应的值为该节点的值；输出值为排好序的所有节点的列表。

def sgd_update(trainables, learning_rate=1e-1):

    for t in trainables:
        partial = t.gradients[t]
        t.value -= learning_rate * partial

sgd_update函数用于对训练的参数进行更新。learning rate的default value是1e-1。

nn.py

import numpy as np
from sklearn.datasets import load_boston
from sklearn.utils import shuffle, resample
from miniflow import *

# Load data
data = load_boston()
X_ = data['data']
y_ = data['target']

# Normalize data
X_ = (X_ - np.mean(X_, axis=0)) / np.std(X_, axis=0)

n_features = X_.shape[1]
n_hidden = 10
W1_ = np.random.randn(n_features, n_hidden)
b1_ = np.zeros(n_hidden)
W2_ = np.random.randn(n_hidden, 1)
b2_ = np.zeros(1)

# Neural network
X, y = Input(), Input()
W1, b1 = Input(), Input()
W2, b2 = Input(), Input()

l1 = Linear(X, W1, b1)
s1 = Sigmoid(l1)
l2 = Linear(s1, W2, b2)
cost = MSE(y, l2)

feed_dict = {
    X: X_,
    y: y_,
    W1: W1_,
    b1: b1_,
    W2: W2_,
    b2: b2_
}

epochs = 10
# Total number of examples
m = X_.shape[0]
batch_size = 11
steps_per_epoch = m // batch_size

graph = topological_sort(feed_dict)
trainables = [W1, b1, W2, b2]

print("Total number of examples = {}".format(m))

# Step 4
for i in range(epochs):
    loss = 0
    for j in range(steps_per_epoch):
        # Step 1
        # Randomly sample a batch of examples
        X_batch, y_batch = resample(X_, y_, n_samples=batch_size)

        # Reset value of X and y Inputs
        X.value = X_batch
        y.value = y_batch

        # Step 2
        forward_and_backward(graph)

        # Step 3
        sgd_update(trainables)

        loss += graph[-1].value

    print("Epoch: {}, Loss: {:.3f}".format(i+1, loss/steps_per_epoch))

import numpy as np

随机梯度下降的步骤：
（1）从总的数据集中随机抽样一批数据。
（2）前向和后向运行网络，计算梯度（根据第 (1) 步的数据）。
（3）应用梯度下降更新。
（4）重复第 1-3 步，直到出现收敛情况或者循环被其他机制暂停（即迭代次数）。
即每次迭代计算的梯度并不使用所有的训练集，而是在每次迭代的过程中随机抽取一部分数据进行训练和更新，直至Loss函数收敛。
原理上，应该还有一个从训练集中抽取出的validation set用于对训练的网络进行验证，以保证我们的训练的网络是一个好的“learner”而不是只能将训练集拟合的很好，而无从应对新的测试数据。