反向传播网络解决异或问题C语言,反向传播实现不与异或数据聚合

最新推荐文章于 2025-04-29 14:26:00 发布

weixin_39796116

最新推荐文章于 2025-04-29 14:26:00 发布

阅读量153

点赞数

文章标签：反向传播网络解决异或问题C语言

博主在尝试理解反向传播算法的过程中遇到困惑，主要关于权重更新、偏置处理以及不同实现之间的差异。在Coursera的机器学习课程中，与网上的实现方式有所出入，特别是delta3的计算以及是否包含学习率和样本数量的调整。博主还提到了在某些实现中，偏置的更新处理方式不一致，并询问了关于偏置处理的正确方法。博客以一段代码展示了作者的实现，并给出了运行输出。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

为了这个确切的问题，我到处找了很多地方，站了几个晚上，查看了许多不同的反向传播实现(也有堆栈溢出)，但我似乎不明白它们是如何工作的。在

我目前就读于Andrew Ng的CurSera机器学习课程，这很好，但是在课程中所展示的后支柱实现与我在互联网上看到的截然不同。在

我在理解尺寸和计算每个重量的增量时遇到了问题。如果有人能告诉我反向传播到底发生了什么，我将非常感激。我在前进道具上没有问题。在

这是我的代码(跳到第一个for循环)。在import numpy as np

import sys

x_train = np.array([

[1, 0, 1],

[1, 1, 0],

[1, 1, 1],

[1, 0, 0]

])

y_train = np.array([

[1],

[1],

[0],

[0]

])

learning_rate = 0.03

reg_param = 0.5

num_h_units = 5

max_iter = 60000 # for gradient descent

m = 4 # training

np.random.seed(1)

weights1 = np.random.random((x_train.shape[1], num_h_units)) # 3x5 (Including bias)

weights2 = np.random.random((num_h_units + 1, 1)) # 6x1 (Including bias)

def sigmoid(z, derv=False):

if derv: return z * (1 - z)

return (1 / (1 + np.exp(-z)))

def forward(x, predict=False):

a1 = x # 1x3

a1.shape = (1, a1.shape[0]) # Reshaping now, to avoid reshaping the other activations.

a2 = np.insert(sigmoid(a1.dot(weights1)), 0, 1, axis=1) # 1x3 * 3x5 = 1x5 + bias = 1x6

a3 = sigmoid(a2.dot(weights2)) # 1x6 * 6x1 = 1x1

if predict: return a3

return (a1, a2, a3)

w_grad1 = 0

w_grad2 = 0

for i in range(max_iter):

for j in range(m):

sys.stdout.write("\rIteration: {} and {}".format(i + 1, j + 1))

a1, a2, a3 = forward(x_train[j])

delta3 = np.multiply((a3 - y_train[j]), sigmoid(a3, derv=True)) # 1x1

# (1x6 * 1x1) .* 1x6 = 1x6 (Here, ".*" stands for element wise mult)

delta2 = np.multiply((weights2.T * delta3), sigmoid(a2, derv=True))

delta2 = delta2[:, 1:] # Getting rid of the bias value since that shouldn't be updated.

# 3x1 * 1x5 = 3x5 (Gradient of all the weight values for weights connecting input to hidden)

w_grad1 += (1 / m) * a1.T.dot(delta2)

# 6x1 * 1x1 = 6x1 (Updating the bias as well. If bias is removed, dimensions don't match)

a2[:, 0] = 0

w_grad2 += (1 / m) * a2.T.dot(delta3)

sys.stdout.flush() # Updating the text.

weights1 -= learning_rate * w_grad1

weights2 -= learning_rate * w_grad2

# Outputting all the outputs at once.

a1_full = x_train

a2_full = np.insert(sigmoid(a1_full.dot(weights1)), 0, 1, axis=1)

a3_full = sigmoid(a2_full.dot(weights2))

print(a3_full)

以下是我得到的输出：

我也不明白以下几点：在coursera课程中，delta3是通过以下操作计算的：a3-target，但我见过的其他地方正在计算delta3(a3-target)*sigmoid(a3，derv=True)。我很困惑，哪一个是正确的？为什么呢？在

在许多实现中，这个人没有使用学习速率和(1/m)来降低梯度。学习率和(1/m)是可选的吗？在

我们该怎么处理这些偏见？更新他们？不更新吗？在许多其他的实现中，我也看到人们只是更新偏见。在

是否有一个设定的位置，偏差应该在哪里？就像第一列或最后一列。等等

我需要做什么np.插入()在计算中添加偏差列？在

我对此非常迷茫，所以提前谢谢你。我以为我了解反向传播，但实现它绝对是一场噩梦。在

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。