写在正文之前: 这篇紧接着上一篇的博文 深度学习笔记-----基于TensorFlow2.2.0代码练习(第一课) 主要写的是TensorFlow2.0的代码练习,跟随着KGP Talkie的【TensorFlow 2.0】实战进阶教程进行学习,并将其中一些不适用的代码错误进行修改。 本文跟随视频油管非常火的【TensorFlow 2.0】实战进阶教程(中英字幕+代码实战) 第二课
课程所需要的数据链接:https://pan.baidu.com/s/1Lpo3l3UaPANOGE_HGJf2TQ 提取码:dqo4 注意:需要把数据放到jupyter目录下
如何建立第一个ANN
1 数据处理 2 建立输入层 3 初始随机化输入权重W 4 建立隐藏层 5 选择优化,损失和精确性指标 6 编译模型 7 使用model.fit 训练模型 8 评估模型 9 如果有需要的话调整模型
import tensorflow as tf
from tensorflow import keras
from tensorflow. python. keras import Sequential
from tensorflow. python. keras. layers import Flatten, Dense
import numpy as np
import pandas as pd
from sklearn. model_selection import train_test_split
dataset = pd. read_csv( 'customer_Churn_Modelling.csv' )
dataset. head( )
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited 0 1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1 1 2 15647311 Hill 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0 2 3 15619304 Onio 502 France Female 42 8 159660.80 3 1 0 113931.57 1 3 4 15701354 Boni 699 France Female 39 1 0.00 2 0 0 93826.63 0 4 5 15737888 Mitchell 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0
X = dataset. drop( labels= [ 'CustomerId' , 'Surname' , 'RowNumber' , 'Exited' ] , axis = 1 )
y = dataset[ 'Exited' ]
X. head( )
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary 0 619 France Female 42 2 0.00 1 1 1 101348.88 1 608 Spain Female 41 1 83807.86 1 0 1 112542.58 2 502 France Female 42 8 159660.80 3 1 0 113931.57 3 699 France Female 39 1 0.00 2 0 0 93826.63 4 850 Spain Female 43 2 125510.82 1 1 1 79084.10
y. head( )
0 1
1 0
2 1
3 0
4 0
Name: Exited, dtype: int64
from sklearn. preprocessing import LabelEncoder
label1 = LabelEncoder( )
X[ 'Geography' ] = label1. fit_transform( X[ 'Geography' ] )
X. head( )
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary 0 619 0 Female 42 2 0.00 1 1 1 101348.88 1 608 2 Female 41 1 83807.86 1 0 1 112542.58 2 502 0 Female 42 8 159660.80 3 1 0 113931.57 3 699 0 Female 39 1 0.00 2 0 0 93826.63 4 850 2 Female 43 2 125510.82 1 1 1 79084.10
label2 = LabelEncoder( )
X[ 'Gender' ] = label1. fit_transform( X[ 'Gender' ] )
X. head( )
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary 0 619 0 0 42 2 0.00 1 1 1 101348.88 1 608 2 0 41 1 83807.86 1 0 1 112542.58 2 502 0 0 42 8 159660.80 3 1 0 113931.57 3 699 0 0 39 1 0.00 2 0 0 93826.63 4 850 2 0 43 2 125510.82 1 1 1 79084.10
CreditScore Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Geography_1 Geography_2 0 619 0 42 2 0.00 1 1 1 101348.88 0 0 1 608 0 41 1 83807.86 1 0 1 112542.58 0 1 2 502 0 42 8 159660.80 3 1 0 113931.57 0 0 3 699 0 39 1 0.00 2 0 0 93826.63 0 0 4 850 0 43 2 125510.82 1 1 1 79084.10 0 1 5 645 1 44 8 113755.78 2 1 0 149756.71 0 1 6 822 1 50 7 0.00 2 1 1 10062.80 0 0 7 376 0 29 4 115046.74 4 1 0 119346.88 1 0 8 501 1 44 4 142051.07 2 0 1 74940.50 0 0 9 684 1 27 2 134603.88 1 1 1 71725.73 0 0
X = pd. get_dummies( X, drop_first= True , columns= [ 'Geography' ] )
X. head( 30 )
CreditScore Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Geography_1 Geography_2 0 619 0 42 2 0.00 1 1 1 101348.88 0 0 1 608 0 41 1 83807.86 1 0 1 112542.58 0 1 2 502 0 42 8 159660.80 3 1 0 113931.57 0 0 3 699 0 39 1 0.00 2 0 0 93826.63 0 0 4 850 0 43 2 125510.82 1 1 1 79084.10 0 1 5 645 1 44 8 113755.78 2 1 0 149756.71 0 1 6 822 1 50 7 0.00 2 1 1 10062.80 0 0 7 376 0 29 4 115046.74 4 1 0 119346.88 1 0 8 501 1 44 4 142051.07 2 0 1 74940.50 0 0 9 684 1 27 2 134603.88 1 1 1 71725.73 0 0 10 528 1 31 6 102016.72 2 0 0 80181.12 0 0 11 497 1 24 3 0.00 2 1 0 76390.01 0 1 12 476 0 34 10 0.00 2 1 0 26260.98 0 0 13 549 0 25 5 0.00 2 0 0 190857.79 0 0 14 635 0 35 7 0.00 2 1 1 65951.65 0 1 15 616 1 45 3 143129.41 2 0 1 64327.26 1 0 16 653 1 58 1 132602.88 1 1 0 5097.67 1 0 17 549 0 24 9 0.00 2 1 1 14406.41 0 1 18 587 1 45 6 0.00 1 0 0 158684.81 0 1 19 726 0 24 6 0.00 2 1 1 54724.03 0 0 20 732 1 41 8 0.00 2 1 1 170886.17 0 0 21 636 0 32 8 0.00 2 1 0 138555.46 0 1 22 510 0 38 4 0.00 1 1 0 118913.53 0 1 23 669 1 46 3 0.00 2 0 1 8487.75 0 0 24 846 0 38 5 0.00 1 1 1 187616.16 0 0 25 577 1 25 3 0.00 2 0 1 124508.29 0 0 26 756 1 36 2 136815.64 1 1 1 170041.95 1 0 27 571 1 44 9 0.00 2 0 0 38433.35 0 0 28 574 0 43 3 141349.43 1 1 1 100187.43 1 0 29 411 1 29 0 59697.17 2 1 1 53483.21 0 0
特征标准化
from sklearn. preprocessing import StandardScaler
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size = 0.2 , random_state = 0 , stratify = y)
scaler = StandardScaler( )
X_train = scaler. fit_transform( X_train)
X_test = scaler. fit_transform( X_test)
y_test
1344 1
8167 0
4747 0
5004 1
3124 1
..
9107 0
8249 0
8337 0
6279 1
412 0
Name: Exited, Length: 2000, dtype: int64
构建ANN
model = Sequential( )
model. add( Dense( X. shape[ 1 ] , activation= 'relu' , input_dim = X. shape[ 1 ] ) )
model. add( Dense( 128 , activation = 'relu' ) )
model. add( Dense( 1 , activation = 'sigmoid' ) )
WARNING:tensorflow:From F:\Anaconda3\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
model. compile ( optimizer = 'adam' , loss = 'binary_crossentropy' , metrics= [ 'accuracy' ] )
model. fit( X_train, y_train. to_numpy( ) , batch_size= 10 , epochs= 10 , verbose= 1 )
WARNING:tensorflow:From F:\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/10
8000/8000 [==============================] - 1s 94us/sample - loss: 0.4515 - acc: 0.8049
Epoch 2/10
8000/8000 [==============================] - 1s 80us/sample - loss: 0.4185 - acc: 0.8202
Epoch 3/10
8000/8000 [==============================] - 1s 80us/sample - loss: 0.4057 - acc: 0.8324
Epoch 4/10
8000/8000 [==============================] - 1s 77us/sample - loss: 0.3752 - acc: 0.8431
Epoch 5/10
8000/8000 [==============================] - 1s 79us/sample - loss: 0.3507 - acc: 0.8571
Epoch 6/10
8000/8000 [==============================] - 1s 78us/sample - loss: 0.3415 - acc: 0.8591
Epoch 7/10
8000/8000 [==============================] - 1s 79us/sample - loss: 0.3363 - acc: 0.8620
Epoch 8/10
8000/8000 [==============================] - 1s 84us/sample - loss: 0.3345 - acc: 0.8619
Epoch 9/10
8000/8000 [==============================] - 1s 74us/sample - loss: 0.3328 - acc: 0.8602
Epoch 10/10
8000/8000 [==============================] - 1s 74us/sample - loss: 0.3302 - acc: 0.8626
<tensorflow.python.keras.callbacks.History at 0x1d77c75d248>
y_pred = model. predict_classes( X_test)
y_pred
array([[0],
[0],
[0],
...,
[0],
[1],
[0]])
y_test
1344 1
8167 0
4747 0
5004 1
3124 1
..
9107 0
8249 0
8337 0
6279 1
412 0
Name: Exited, Length: 2000, dtype: int64
model. evaluate( X_test, y_test. to_numpy( ) )
2000/2000 [==============================] - 0s 34us/sample - loss: 0.3583 - acc: 0.8535
[0.3583366745710373, 0.8535]
from sklearn. metrics import confusion_matrix, accuracy_score
confusion_matrix( y_test, y_pred)
array([[1525, 68],
[ 225, 182]], dtype=int64)
accuracy_score( y_test, y_pred)
0.8535