数据加载
keras.datasets
boston housing
- Boston housing price regression dataset
mnist/fashion mnist
- MNIST/Fashion-MNIST dataset
MNIST
(x, y), (x_test, y_test) = keras.datasets.mnist.load_data()
x.shape # 60000,28,28
y.shape # 60000,
x_test.shape, y_test.shape
# (10000, 28, 28), (10000, )
y_onehot = tf.one_hot(y, depth=10)
cifar10/100
- small images classification dataset
- 10/100区别在于大类下的label划分
- [32, 32, 3]
(x, y), (x_test, y_test) = keras.datasets.cifar10.load_data()
x.shape, y.shape, x_test.shape, y_test.shape
# ((50000, 32, 32, 3), (50000, 1), (10000, 32, 32, 3), (10000, 1))
imdb
- sentiment classification dataset
tf.data.Dataset
- from_tensor_slices()
(x, y), (x_test, y_test) = keras.datasets.cifar10.load_data()
db = tf.data.Dataset.from_tensor_slices(x_test)
next(iter(db)).shape
# [32, 32, 3]
db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
next(iter(db))[0].shape
# [32, 32, 3]
- .shuffle
db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
db = fb.shuffle(10000)
- .map
def preprocess(x, y):
x = tf.cast(x, dtype=tf.float32)/255.
y = tf.cast(y, dtype=tf.int32)
y = tf.one_hot(y, depth=10)
return x, y
db2 = db.map(preprocess)
res = next(iter(db2))
res[0].shape, res[1].shape
# [32, 32, 3], [1, 10]
res[1][:2]
# shape:(1, 10) dtype = float32
- batch
db3 = db2.batch(32)
res = next(iter(db3))
res[0].shape, res[1].shape
# [32, 32, 32, 3], [32, 1, 10]
- StopIteration
db_iter = iter(db3)
while True:
x,y = next(db_iter)
##########OutOfRangeError#############
- .repeat()
db4 = db3.repeat()
db4 = db3.repeat(4)
Example
def pre_mnist(x, y):
x = tf.cast(x, tf.float32) / 255.0
y = tf.cast(y, tf.int64)
return x, y
def mnist_dataset():
(x, y), (x_val, y_val) = datasets.fashion_mnist.load_data()
y = tf.one_hot(y, depth=10)
y_val = tf.one_hot(y_val, depth=10)
ds = tf.data.Dataset.from_tensor_slices((x, y))
ds = ds.map(pre_mnist)
ds = ds.shuffle(60000).batch(100)
ds_val = tf.data.Dataset.from_tensor_slices((x_val, y_val))
ds_val = ds_val.map(pre_mnist)
ds_val = ds_val.shuffle(10000).batch(100)
return ds, ds_val
全连接层
Dense
x = tf.random.normal([4,784])
net = tf.keras.layers.Dense(512)
out = net(x)
out.shape
TensorShape([4, 512])
net.kernel.shape, net.bias.shape
# [784, 512], [512]
net = tf.keras.layers.Dense(10)
net.bias
net.get_weights() # []
net.weights # []
net.build(input_shape=(None, 4))
net.kernel.shape, net.bias.shape
# [4, 10], [10]
net,build(input_shape=(None, 20))
net.kernel.shape, net.bias.shape
# [20, 10], [10]
Sequential
x = tf.random.normal([2, 3])
model = keras.Sequential([
keras.layers.Dense(2, activation='relu'),
keras.layers.Dense(2, activation='relu'),
keras.layers.Dense(2)
])
model.build(input_shape=[None, 3])
model.summary()
for p inmodel.trainable_variables:
print(p.name, p.shape)
Output
y ∈ R d y \in R^d y∈Rd
- linear regression
- naive classification with MSE
- other general prediction
- out = relu(X@W + b)
- logits
y i ∈ [ 0 , 1 ] y_i \in [0,1] yi∈[0,1]
- binary classification
- y > 0.5 → 1 y>0.5\rightarrow 1 y>0.5→1
- y > 0.5 → 1 y>0.5\rightarrow 1 y>0.5→1
- Image Generation
- rgb
tf.sigmoid
σ ( x ) = 1 1 + e − x \sigma(x) = \frac{1}{1+e^{-x}} σ(x)=1+e−x1
a = tf.linspace(-6., 6, 10)
tf.sigmoid(a)
x = tf.random.normal([1, 28, 28])*5
tf.reduce_min(x), tf.reduce_max(x)
x = tf.sigmoid(x)
tf.reduce_min(x), tf.reduce_max(x)
y i ∈ [ 0 , 1 ] , ∑ y i = 1 y_i \in [0,1], \sum{y_i} = 1 yi∈[0,1],∑yi=1
softmax
a = tf.linspace(-2., 2, 5)
tf.nn.softmax(a)
Classification
logits = tf.random.uniform([1, 10], minval=-2, maxval=2)
prob = tf.nn.softmax(logits, axis=1)
tf.reduce_sum(prob, axis=1)
y i ∈ [ − 1 , 1 ] y_i \in [-1, 1] yi∈[−1,1]
tanh
a = tf.linspace(-2., 2, 5)
tf.tanh(a)
损失函数
MSE
- M S E = 1 N ∑ ( y − o u t ) 2 MSE = \frac{1}{N} \sum{(y - out)^2} MSE=N1∑(y−out)2
- L 2 − n o r m = ∑ ( y − o u t ) 2 L_{2-norm} = \sqrt{\sum{(y - out)^2}} L2−norm=∑(y−out)2
y = tf.constant([1, 2, 3, 0, 2])
y = tf.one_hot(y, depth=4)
y = tf.cast(y, dtype=tf.float32)
out = tf.random.normal([5, 4])
loss1 = tf.reduce_mean(tf.square(y - out))
loss2 = tf.square(tf.norm(y - out))/(5 * 4)
loss3 = tf.reduce_mean(tf.losses.MSE(y, out))#这个mean求的是batch的平均,MSE输出的shape为(5,)
#loss1 = loss2 = loss3
Cross Entropy Loss
Entropy
- H ( P ) = − ∑ i p ( i ) l o g p ( i ) = − E x ∼ p [ l o g p ( x ) ] H(P) = -\sum\limits_{i}{p(i)logp(i)} = -E_{x\sim p}{[logp(x)]} H(P)=−i∑p(i)logp(i)=−Ex∼p[logp(x)]
Cross Entropy
- H ( p , q ) = − ∑ p ( x ) l o g q ( x ) = − E x ∼ p [ l o g q ( x ) ] H(p,q) = - \sum{p(x)logq(x)} = -E_{x\sim p}{[logq(x)]} H(p,q)=−∑p(x)logq(x)=−Ex∼p[logq(x)]
- H ( p , q ) = H ( p ) + D K L ( p ∣ q ) H(p,q) = H(p) + D_{KL}(p|q) H(p,q)=H(p)+DKL(p∣q)
- KL散度可以衡量两个分布的差异
- D K L ( p ∣ ∣ q ) = E x ∼ p [ l o g p ( x ) q ( x ) ] D_{KL}(p||q) = E_{x\sim p}{[log{\frac{p(x)}{q(x)}}]} DKL(p∣∣q)=Ex∼p[logq(x)p(x)]
Classification
Single output(binary)
l o s s = − ( y l o g y ^ + ( 1 − y ) l o g ( 1 − y ^ ) ) loss = -(ylog\hat y + (1-y)log(1-\hat y)) loss=−(ylogy^+(1−y)log(1−y^))
Multi output
onehot编码时
l
o
s
s
=
−
l
o
g
q
i
w
h
e
r
e
p
i
=
1
loss = -logq_i\ where \ pi = 1
loss=−logqi where pi=1
Categorical Cross Entropy
tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.25, 0.25, 0.25, 0.25])
#1.3862
tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.01, 0.97, 0.01, 0.01])
#0.0304
tf.losses.BinaryCrossentropy()([1],[0.1])#class
tf.losses.binary_crossentropy([1],[0.1])#func
Numerical Stability
x = tf.random.normal([1, 784])
w = tf.random.normal([784, 2])
b = tf.zeros([2])
logits = x @ w + b
prob = tf.math.softmax(logits, axis=1)
tf.losses.categorical_crossentropy([0, 1], logits, from_logits=True)
tf.losses.categorical_crossentropy([0, 1], prob)#same as above but not recommanded
随着tf2.1的更新,这些被整合到tf.keras.losses中了
本文介绍了在TensorFlow2.x中如何使用`keras.datasets`加载数据,如boston housing、MNIST等,并探讨了`tf.data.Dataset`的使用。接着详细讲解了全连接层的Dense和Sequential模型,以及不同类型的输出,如sigmoid、softmax和tanh。同时,文章涵盖了损失函数,包括MSE和交叉熵损失,及其在分类问题中的应用。
1192

被折叠的 条评论
为什么被折叠?



