实现了一个神经网络-二进制分类
让我们在python中实现一个用于二进制分类的基本神经网络,该神经网络用于在给定图像为0或1时进行分类。
In [277]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
%matplotlib inline
import itertools
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn import metrics
2.1数据准备
第一步是加载和准备数据集
数据来源:https://github.com/woshizhangrong/train_raw
In [278]:
train = pd.read_csv("E:/Digit_Recognizer/train1.csv")
In [279]:
sns.countplot(train['label'])
Out[279]:
<matplotlib.axes._subplots.AxesSubplot at 0x243f59e8>
In [280]:
# Check for null and missing values
train.isnull().any().describe()
Out[280]:
count 785
unique 1
top False
freq 785
dtype: object
我检查损坏的图像(里面缺少值)。
在训练和测试数据集中没有缺失值。所以我们可以安全的继续。
In [281]:
# include only the rows having label = 0 or 1 (binary classification)
X = train[train['label'].isin([0, 1])]
# target variable
Y = train[train['label'].isin([0, 1])]['label']
# remove the label from X
X = X.drop(['label'], axis = 1)
2.2实现激活函数
我们将使用sigmoid激活函数,因为它输出0和1之间的值,所以这是一个很好的选择,对于二元分类问题
In [282]:
# implementing a sigmoid activation function
def sigmoid(z):
s = 1.0/ (1 + np.exp(-z))
return s
2.3定义神经网络架构
创建一个有三层的模型——输入,隐藏,输出。
In [283]:
def network_architecture(X, Y):
# nodes in input layer
n_x = X.shape[0]
# nodes in hidden layer
n_h = 10
# nodes in output layer
n_y = Y.shape[0]
return (n_x, n_h, n_y)
2.4定义神经网络参数
神经网络参数是权重和偏差,我们需要初始化为零值。第一层只包含输入,所以没有权重和偏差,但是隐藏层和输出层有一个权重和偏差项。(W1、b1、W2、b2)
In [284]:
def define_network_parameters(n_x, n_h, n_y):
W1 = np.random.randn(n_h,n_x) * 0.01 # random initialization
b1 = np.zeros((n_h, 1)) # zero initialization
W2 = np.random.randn(n_y,n_h) * 0.01
b2 = np.zeros((n_y, 1))
return {"W1": W1, "b1": b1, "W2": W2, "b2": b2}
2.5实现正向传播
隐藏层和输出层将使用sigmoid激活函数计算激活,并将其向前传递。在计算这个激活时,在将输入传递给函数之前,输入要乘以权重并加上偏差。
In [285]:
def forward_propagation(X, params):
Z1 = np.dot(params['W1'], X)+params['b1']
A1 = sigmoid(Z1)
Z2 = np.dot(params['W2'], A1)+params['b2']
A2 = sigmoid(Z2)
return {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2}
2.6计算网络误差
为了计算成本,一种直接的方法是计算预测值与实际值之间的绝对误差。但更好的损失函数是对数损失函数,其定义为:
-Summ ( Log (Pred) Actual + Log (1 - Pred ) Actual ) / m
In [286]:
def compute_error(Predicted, Actual):
logprobs = np.multiply(np.log(Predicted), Actual)+ np.multiply(np.log(1-Predicted), 1-Actual)
cost = -np.sum(logprobs) / Actual.shape[1]
return np.squeeze(cost)
2.7实现反向传播
在反向传播函数中,将误差反向传递到前一层,计算权值和偏差的导数。然后使用导数更新权重和偏差。
In [287]:
def backward_propagation(params, activations, X, Y):
m = X.shape[1]
# output layer
dZ2 = activations['A2'] - Y # compute the error derivative
dW2 = np.dot(dZ2, activations['A1'].T) / m # compute the weight derivative
db2 = np.sum(dZ2, axis=1, keepdims=True)/m # compute the bias derivative
# hidden layer
dZ1 = np.dot(params['W2'].T, dZ2)*(1-np.power(activations['A1'], 2))
dW1 = np.dot(dZ1, X.T)/m
db1 = np.sum(dZ1, axis=1,keepdims=True)/m
return {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}
def update_parameters(params, derivatives, alpha = 1.2):
# alpha is the model's learning rate
params['W1'] = params['W1'] - alpha * derivatives['dW1']
params['b1'] = params['b1'] - alpha * derivatives['db1']
params['W2'] = params['W2'] - alpha * derivatives['dW2']
params['b2'] = params['b2'] - alpha * derivatives['db2']
return params
2.8模型的编写和培训
创建一个编译所有关键函数的函数,并创建一个神经网络模型。
In [288]:
def neural_network(X, Y, n_h, num_iterations=100):
n_x = network_architecture(X, Y)[0]
n_y = network_architecture(X, Y)[2]
params = define_network_parameters(n_x, n_h, n_y)
for i in range(0, num_iterations):
results = forward_propagation(X, params)
error = compute_error(results['A2'], Y)
derivatives = backward_propagation(params, results, X, Y)
params = update_parameters(params, derivatives)
return params
In [289]:
y = Y.values.reshape(1, Y.size)
x = X.T.values
model = neural_network(x, y, n_h = 10, num_iterations = 10)
d:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: RuntimeWarning: overflow encountered in exp
This is separate from the ipykernel package so we can avoid doing imports until
2.9预测
In [290]:
def predict(parameters, X):
results = forward_propagation(X, parameters)
print (results['A2'][0])
predictions = np.around(results['A2'])
return predictions
predictions = predict(model, x)
print ('Accuracy: %d' % float((np.dot(y,predictions.T) + np.dot(1-y,1-predictions.T))/float(y.size)*100) + '%')
[0.96541361 0.40257492 0.96541361 ... 0.96541361 0.40257492 0.96541361]
Accuracy: 94%
d:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: RuntimeWarning: overflow encountered in exp
This is separate from the ipykernel package so we can avoid doing imports until
混淆矩阵
混淆矩阵可以非常有助于看到您的模型的缺点。
In [291]:
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, cm[i, j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
In [292]:
confusion_mtx = confusion_matrix(predictions.reshape(-1,1), y.reshape(-1,1))
# plot the confusion matrix
plot_confusion_matrix(confusion_mtx, classes = range(2))
3.实现了神经网络的多类分类
在前面的步骤中,我讨论了如何在python中从头开始实现用于二进制分类的NN。Python的库(如sklearn)提供了高效的神经网络的优秀实现,可以直接在数据集上实现神经网络。在本节中,让我们实现一个多类神经网络来对图像中显示的数字进行从0到9的分类
3.1数据准备
将列车数据集分割为列车和验证集
In [293]:
Y = train['label'][:20000] # use more number of rows for more training
X = train.drop(['label'], axis = 1)[:20000] # use more number of rows for more training
x_train, x_val, y_train, y_val = train_test_split(X, Y, test_size=0.20, random_state=42)
In [294]:
# Some examples
plt.imshow(x_train.iloc[0].values.reshape(28,28))
Out[294]:
<matplotlib.image.AxesImage at 0x1c155588>
3.2模型的训练
训练具有10个隐含层的神经网络模型。
In [295]:
from sklearn import neural_network
model = neural_network.MLPClassifier(alpha=1e-5, hidden_layer_sizes=(80,), solver='lbfgs', random_state=18)
model.fit(x_train, y_train)
Out[295]:
MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
beta_2=0.999, early_stopping=False, epsilon=1e-08,
hidden_layer_sizes=(80,), learning_rate='constant',
learning_rate_init=0.001, max_iter=200, momentum=0.9,
nesterovs_momentum=True, power_t=0.5, random_state=18, shuffle=True,
solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
warm_start=False)
3.3 预测
In [296]:
predicted = model.predict(x_val)
print("Classification Report:\n %s:" % (metrics.classification_report(y_val, predicted)))
print(model.score(x_val,y_val))
Classification Report:
precision recall f1-score support
0 0.95 0.98 0.96 390
1 0.98 0.98 0.98 483
2 0.92 0.91 0.91 386
3 0.92 0.89 0.90 412
4 0.92 0.93 0.92 379
5 0.91 0.91 0.91 355
6 0.96 0.95 0.95 396
7 0.94 0.93 0.93 452
8 0.88 0.92 0.90 356
9 0.89 0.89 0.89 391
avg / total 0.93 0.93 0.93 4000
:
0.92825
In [297]:
confusion_mtx = confusion_matrix(predicted, y_val)
# plot the confusion matrix
plot_confusion_matrix(confusion_mtx, classes = range(10))
In [298]:
# Display some error results
def display_errors(errors_index,img_errors,pred_errors, obs_errors):
""" This function shows 6 images with their predicted and real labels"""
n = 0
nrows = 2
ncols = 3
fig, ax = plt.subplots(nrows,ncols,sharex=True,sharey=True)
for row in range(nrows):
for col in range(ncols):
error = errors_index[n]
ax[row,col].imshow(img_errors.iloc[error].values.reshape(28,28))
ax[row,col].set_title("Predicted label :{}\nTrue label :{}".format(pred_errors[error],obs_errors.values[error]))
n += 1
In [299]:
# Errors are difference between predicted labels and true labels
errors = (predicted - y_val != 0)
pred_classes_errors = predicted[errors]
image_errors = x_val[errors]
true_classes_errors = y_val[errors]
error_idx = np.random.randint(low=0, high=len(image_errors), size=6)
In [300]:
# Show the 6 errors
display_errors(error_idx,image_errors ,pred_classes_errors, true_classes_errors)