hands on machine learning

本文深入介绍了机器学习中的各种概念和技术,包括Sklearn库的常用操作,如OneHotEncoder、GridSearch、SDGClassifier、混淆矩阵等。进一步探讨了TensorFlow中的神经网络构建,如激活函数、复杂模型、优化器和RNN(LSTM)。内容涵盖从线性回归到深度学习的多个方面,是理解机器学习和深度学习的宝贵资源。

SKlearn

common符号

m is the number of instance
x ( i ) x^{(i)} x(i)a vector of all feature value of the i t h i^{th} ith instance
X is a matrix containing all feature values(excluding labels)
( ( x ( 1 ) ) T ( x ( 2 ) ) T . . . ( x ( 2000 ) ) T ) \left(\begin{array}{c} \left( x^{(1)} \right) ^T\\ \left( x^{(2)} \right) ^T\\ ...\\ \left( x^{(2000)} \right) ^T\\ \end{array}\right) (x(1))T(x(2))T...(x(2000))T
h is prediciton function, also called hypothesis
y ^ = h ( x ( 1 ) ) = 158 , 300 \hat y = h(x^{(1)}) = 158,300 y^=h(x(1))=158,300

ONneHotEncoder p67

p67
假设一个n行column有m个不同categorical 元素,那么会将其转换成n × \times × m 的稀疏矩阵,对应 元素为1,其他元素为0.

Grid Search p76

Try all permutation of hyper parameters with a chosen estimator, return the best one.
代码如下

from sklearn.model_selection import GridSearchCV
param_grid = [
   {'n_estimators': [3,10,30], 'max_features': [2,4,6,8]}
   {'bootstrap': [False],  'n_estimators': [3, 10], 'max_features': [2,3,4]},
   ]
forest_reg = RandomForestRegressor()
grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
   							scoring='neg_mean_squared_error',
   							return_train_score=True)
grid_search.fit(housing_prepared, housing_labels)

GridSearchCV.best_params_ 返回dict类型的params
GridSearchCV.best_estimator_ 返回最优estimator

For more randomness, try from sklearn.model_selection import RandomizedSearchCV on page #78.

如何测试结果 p79

1.pipeline.transform(X_test)
2. pipeline.transform(Y_test) //X,Y来自test_set
3. final_predictions = final_model.predict(X_test_prepared)
4. final_mse = mean_squared_error(y_test, final_predictions)
5. final_rmse = np.sqrt(final_mse)

SDGClassifier p88

Stochastic Gradient Descent (一个linear model

from sklearn.linear_model import SDGClassifier
sgd_clf = SDGClassifier(random_state=42)
sgd_clf.fit(X_train, y_train)

  • 计算loss function随意选择一个sample,而不是计算对于所有的loss的和。
  • 比普通gradient更快

cross_va_predict p90

代码如下

from sklearn.model_selection import cross_val_predict
y_train_pred = cross_val_predict(<estimator>, X_train, y_train_5, cv = 3)
这里estimator不用被fit, 所有的prediction都是‘clean'的(没有被用到train set)

confusion_matrix p90

( 53057 , 1522 1325 , 4096 ) \begin{pmatrix} 53057, & 1522\\ 1325, &4096\end{pmatrix} (53057,1325,15224096) = ( T r u e   N e g a t i v e ( T N ) , F a l s e   P o s i t i v e ( F P ) F a l s e   N e g a t i v e ( F N ) , T r u e   P o s i t i v e ( T P ) ) \begin{pmatrix} True~Negative(TN), & False~Positive(FP)\\ False~Negative(FN), &True~Positive(TP)\end{pmatrix} (True Negative(TN),False Negative(FN),False Positive(FP)True Positive(TP))
− − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − -------------------------------
Precision = T P T P + F P \displaystyle \frac{TP}{TP + FP} TP+FPTP, 接近于1说明”留下的都是好人“

Recall = T P T P + F N \displaystyle \frac{TP}{TP + FN} TP+FNTP, 接近于1说明”好人都被留下了”

代码如下

from sklearn.metrics import confusion_matrix
confusion_matrix(actural, predicted)
#precision and recall
from sklearn.metrics import precision_score, recall_score
precision_score(y_train_5, y_train_pred)
recall_score(y_train_5, y_train_pred)

receiver operating characteristic (ROC) curve p97

plot true positive rate (recall) against false positive rate
True positive rate = recall = T P T P + F P \displaystyle \frac{TP}{TP + FP} TP+FPTP,

false positive rate (FPR) = F P F P + T N \displaystyle \frac{FP}{FP + TN} FP+TNFP,

from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_train_5, y_scores)
plt.plot(fpr, tpr, linewidth=2, label=label)

OvR, OvO p100

OvR: one-versus the rest

  • run n 次,看哪个class得分最高

OvO: one-versus-one

  • run N ∗ ( N − 1 ) 2 \frac{N*(N-1)}{2} 2N(N1)次,看那个类赢的duel次数最多
from sklearn.multiclass import OneVsRestClassifier #或者OneVsOneClassfier
ovr_clf = OneVsRestClassifier(SVC())
ovr_clf.fit(X_train, y_train)

Linear Regression

y ^ = θ 0 + θ 1 x 1 + . . . + θ n x n \hat y = \theta_0 + \theta_1x_1 + ... + \theta_nx_n y^=θ0+θ1x1+...+θnxn
注意 θ 0 \theta_0 θ0是bias, 再与x相乘的时候要 add x0 = 1 to each instance

X_b = np.c_[np.ones((100,1)), X]
Normal Equation:
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

support vector machine

Binary classifier

如果结果不是binary, sklearn自动run OvO

from sklearn.svm import SVC
svm_clf = SVC()
svm_clf.fit(X_train, y_train)
svm_clf.predict([some_digit])
#或者SGD也可以

TF embedding layer

吧一个categorical data转换为(64) demensional vector. 越接近的category的vector角度之间越小

import tensorflow as tf
from tensorflow import keras
import numpy as np
model = keras.Sequential()
model.add(tf.keras.layers.Embedding(1000, 64, batch_input_shape=[32, None]))
y = model.predict([5])
y.shape
#(1,1,64)

softmax

s o f t m a x ( x i ) = exp ⁡ ( x i ) ∑ j = 1 n exp ⁡ ( x j ) softmax(x_i) = \frac{\exp(x^i )}{\sum^n_{j=1}\exp(x^j)} softmax(xi)=j=1nexp(xj)exp(xi)

PCA p225

project to low dimensional space, keeping the maximum variance

SVD: Singular Value Decomposition, 一个theorem that 任何matrix A m n A_{mn} Amn都可以转换成 U m m S m n V n n T U_{mm}S_{mn}V_{nn}^T UmmSmnVnnT,

SVD详细介绍以及算法
V就是我们的principal ccomponent.
这里 U , V T U, V^T U,VT是orthogonal matrix, aka U T U = I U^TU = I UTU=I, S 是diagonal matrix 在这里插入图片描述

Incremental PCA (IPCA)

split the training set into mini-batches.

Kernel PCA (KPCA)

uses the kernel trick, RBF kernel for example.
use gridSearch to selsect the kernel and hyperparaeters.

LLE

Manifold learning technique
First measuring how each training instance linearly relates to its cloest neighbors
then looking for a low-dimentsional representation of the training set where theses local relationsips are best preserved.
W ^ = a r g m i n w ( x ( i ) ∑ j − 1 m w i , j x ( j ) ) subject to { w i , j = 1 i f x ( j ) is not one of the k c.n. of  x ( i ) ∑ j = 1 m w i , j = 1 f o r i = 1 , 2 , . . . , m \hat W = \underset{w}{argmin} (x^{(i)} \sum^m_{j-1} w_{i,j} x^{(j)})\\ \text{subject to} \begin{cases} w_{i,j} = 1 & if x^{(j)} \text{is not one of the k c.n. of }x{(i)} \\ \sum_{j = 1}^m w_{i,j} = 1 & for i = 1,2,...,m \end{cases} W^=wargmin(x(i)j1mwi,jx(j))subject to{wi,j=1j=1mwi,j=1ifx(j)is not one of the k c.n. of x(i)fori=1,2,...,m
W是一个m x m的矩阵,W_i 的sum为1, w i . j w_{i.j} wi.j越大代表 x ( i ) , x ( j ) x^{(i)}, x^{(j)} x(i),x(j)越接近
然后我们寻咋 z ^ \hat z z^
Z ^ = a r g m i n Z ∑ i = 1 m ( z ( i ) − ∑ j = 1 m w ^ i , j z ( j ) ) 2 \hat Z = \underset{Z}{argmin} \sum_{i=1}^m (z^{(i)} - \sum_{j = 1}^m \hat w _{i,j} z ^{(j)} ) ^2 Z^=Zargmini=1m(z(i)j=1mw^i,jz(j))2
complexity: O ( m log ⁡ ( m ) n log ⁡ ( k ) ) O(m \log (m) n \log (k)) O(mlog(m)nlog(k)) for knn, O ( m n k 3 ) O(mnk^3) O(mnk3) for optimizing the wieght, and O ( d m 2 ) O(dm^2) O(dm2) for constructing z

K-Means p239

unsupervised learning,找到k个centroid
inertia是 mean squared distance betwee neach instance and its closest centroid.
k越大ineratia越小
怎么选k?plot inertia下降曲线,选转折点(coarse approach)
silhouette score

mean silhouette coefficient over all the instances.
silhouette coefficient is equal to (b-1) / max(a, b)
a is the mean distance to the other instances

from sklearn.cluster import KMeans
k = 5
kmeans = KMeans(N_cluster=k)
y_pred = kmeans.fit_predict(X)
>>> y_pred
>array([4,0,1, ... , 2, 1 0], dtype=int32
>>>kmeans.cluster_centers_
array([[-2.8, 1.8],
[0.2, 2.2], 
[-2.7, 2,7],
[-1.4, 2.2],
[-2.8, 1.3]])

Process

  1. start by placing the centroids randomly
  2. lavel the instances
  3. update the centroids
  4. repeat 2 and 3

we can avvoid unluckiness by

good_init = np.array([[-3,3],[-3,2], [-3,1], [-1,2], [0,2]])
kmeans = KMeans(n_clusters=5, init=good_init, n_init - 1)

Or trying more
or

K_Means++

  1. Take one random centroid c ( 1 ) c^{(1)} c(1)
  2. chosse an instand x ( i ) x{(i)} x(i) with probability D ( x ( i ) ) 2 / ∑ j = 1 m D ( x ( j ) ) 2 D(x{(i)})^2 / \sum^m_{j = 1} D (x^{(j)})^2 D(x(i))2/j=1mD(x(j))2

Accelerated K-Means – by Charles Elkan
use triangle inequality

MiniBatchKMeans

as title, use mini batches
speeds up by three or four

Gaussian Mixtures

假设每个clusture都是高斯分布

If z ( i ) = j z^{(i)}=j z(i)=j,then x ( i ) . ~ N ( u ( j ) , σ ( j ) ) . x^{(i)} \tilde . N(u^{(j)}, \sigma^{(j)}). x(i).~N(u(j),σ(j)).
其他类型Kmeans
BIRCH
处理大数据
Mean-Shift

Tensorflow

Perceptron

和neuron差不多(应该?) 每个perceptron有一个z, z = w T w z = w^Tw z=wTw, 然后有一个threshold。
h e a v i s i d e ( z ) = { 0 z < 0 1 z ≥ 0 heaviside (z) = \begin{cases} 0 & z < 0 \\ 1 & z \geq 0 \end{cases} heaviside(z)={01z<0z0
每一层都会有bias,有额外的neuron来表示。这个neuron没有input edge只有output edge

Activation functions p292

没有non linearity的话MLP(mult layer perceptions)就只是一个single layer

Leaky Relu

L R e L U ( x ) = m a x ( α x , x ) LReLU(x) = max(\alpha x, x) LReLU(x)=max(αx,x)

Parametric leaky ReLU

P R e L U ( x ) = m a x ( α x , x ) , a is trainable PReLU(x) = max(\alpha x, x), \text{a is trainable} PReLU(x)=max(αx,x),a is trainable

Scaled ELU (SELU)

S E L U ( x ) = λ { x x ≥ 0 a ( e x − 1 ) x < 0 SELU(x) = \lambda \begin{cases} x & x \geq 0 \\ a(e^x - 1) & x < 0 \end{cases} SELU(x)=λ{xa(ex1)x0x<0
比ELU多了个lambda, 貌似是最好用的
在这里插入图片描述

Complex Models

Wide & Deep nn p309

吧input层和最后的hidden layer直接拼在一起然后通过output layer

  • 可以使模型capture最基础的特征

Keras多种用法

Subclassing API to build dynamic models p313

keras.Model可以被继承

Save a model p314

Using Callbacks p315

model.fit里加上 callbacks=[f1,f2...], 这里f1,f2是keras.callbacks里的方法

TensorBoard p317

一个interactive visualization tool to view the leanrning curves during tranning.

Grid Search p321

RandomizedSearchCV(keras_reg, param_distribs, n_iter=10, cv=3)

zooming

when a region of parameter space is good, search more in it``

Vanishing/Exploding Gradient problem p332

Q; 不太懂为什么会梯度消失,既然每neuron都是下一层的总和

  • 下一层有正有负,所以当做每层只有一个neuron就好

Xavier/Glorot initialization

fan_in: the number of inputs of a layer
fan_out: the number neurons of a layer
fan_avg = (fan_in + fan_out)/2

两种方式:normal distribution with mean 0 and variance σ = 1 f a n a v g \sigma = \frac{1}{fan_{avg}} σ=fanavg1
或者 uniform idstribution between -r and +r, with r = 3 f a n a v g r = \sqrt{\frac{3}{fan_{avg}}} r=fanavg3

InitializationActivation functionsvariance
GlorotNone, tanh, logistic, softmax1/fan_avg
HeReLU and variants2/fan_in
LeCunSELU1/fan_in

Mean是0非常重要,否则有可能explode

Nonsaturating Activation Functions:

Nonsaturating activation function指的是𝑓 is non-saturating iff (|lim𝑧→−∞𝑓(𝑧)|=+∞)∨(|lim𝑧→+∞𝑓(𝑧)|=+∞)
saturating 有左右边界

Batch Normalization

let the model learn the optimal scale and mean of eacho f the layer’s inputs
B指的是mini-batch B
u B = 1 m B ∑ i = 1 B x ( i ) σ B 2 = 1 m B ∑ i = 1 B ( x ( i ) − u B ) 2 x ^ ( i ) = x − u B σ B 2 + ϵ z ^ ( i ) = γ ⊗ x ( i ) + β u_B = \frac{1}{m_B} \sum_{i=1}^B x^{(i)}\\ \sigma_B^2 =\frac{1}{m_B}\sum_{i=1}^B (x^{(i)} - u_B)^2\\ \hat{x}^{(i)} = \frac{x-u_B}{\sqrt{\sigma_B^2+\epsilon}}\\ \hat z^{(i)} = \gamma \otimes x^{(i)} + \beta uB=mB1i=1Bx(i)σB2=mB1i=1B(x(i)uB)2x^(i)=σB2+ϵ xuBz^(i)=γx(i)+β
γ , β \gamma, \beta γ,β 是我们要训练的
train的时间增加,但runtime的速度一样,因为原本的 X W + b → γ ⊗ ( X W + b ) + β XW + b \rightarrow \gamma \otimes (XW + b) + \beta XW+bγ(XW+b)+β

Gradient Clipping

如果gradient 大于1 / 小于 -1, 那么限制他在[1,-1]之内

Transfer Learning p348

reuse pretrained layer
迁移学习在low level feature相似的情况下效果最好
lower layer 对应 low level feature
upper layer 对应 upper level feature
所以我们要去掉深的层,留下浅的并把它锁住。
lower level feature例子: first order image gradients

Pretraining on an Auxiliary Task

如果transfer learning没有类似model怎么办?

  • 自己训练一个
    比如我想训练识别猫头鹰,但没有很多照片
    如果我有很多鸟类照片,可以先训练个识别鸟类的model
    锁住model,加上几层继续训练

Optimizer

Momentum Optimizer

m ← β 1 m − η ∇ θ J ( θ ) θ ← θ − m m \leftarrow \beta_1 m - \eta \nabla_\theta J(\theta)\\ \theta \leftarrow \theta - m mβ1mηθJ(θ)θθm

Nesterov Accelerated Gradient

m ← β m − η ∇ θ J ( θ + β m ) θ ← θ + m ( η = 0.001 , β = 0.9  by default ) m \leftarrow \beta m - \eta \nabla_\theta J(\theta + \beta m)\\ \theta \leftarrow \theta + m ( \eta = 0.001, \beta = 0.9 \text{ by default}) mβmηθJ(θ+βm)θθ+mη=0.001,β=0.9 by default)
吧正常的 θ \theta θ 变成了 θ + β m \theta + \beta m θ+βm, 计算了稍稍往前一点地方的gradient

AdaGrad

scale down the gradient vector along the steepest dimensions.
减少梯度高的梯度,为了更好的朝中心下降。
会让training变慢,但下的比较快
s ← s + ∇ θ J ( θ ) ⊗ ∇ θ J ( θ ) θ ← θ − η ∇ θ J ( θ ) ⊘ s + ϵ s \leftarrow s + \nabla_\theta J(\theta) \otimes \nabla_\theta J(\theta) \\ \theta \leftarrow \theta - \eta \nabla_\theta J(\theta) \oslash \sqrt{s+\epsilon} ss+θJ(θ)θJ(θ)θθηθJ(θ)s+ϵ
不要用在DNN里,会过早停下训练。(在最低点周围的gradient很小,square一下几乎就没有了)

RMSProp

s ← β s + ( 1 − β ) ∇ θ J ( θ ) ⊗ ∇ θ J ( θ ) θ ← θ − η ∇ θ J ( θ ) ⊘ s + ϵ s \leftarrow \beta s +(1-\beta) \nabla_\theta J(\theta) \otimes \nabla_\theta J(\theta) \\ \theta \leftarrow \theta - \eta \nabla_\theta J(\theta) \oslash \sqrt{s+\epsilon} sβs+(1β)θJ(θ)θJ(θ)θθηθJ(θ)s+ϵ

这会让s小一点,于是 θ \theta θ就大一点 芜湖~

Adam and Nadam

芜湖,最常见的出现了
$$ m \leftarrow \beta_2 m - \eta \nabla_\theta J(\theta)\
s \leftarrow \beta_1 s +(1-\beta_1) \nabla_\theta J(\theta) \otimes \nabla_\theta J(\theta) \
\hat m \leftarrow \frac{m}{1-a^n}\

\theta \leftarrow \theta - m\
\theta \leftarrow \theta - \eta \nabla_\theta J(\theta) \oslash \sqrt{s+\epsilon}
$$

RNN(LSTM)

sigmoid表示gate,0-1的probalility
在这里插入图片描述
从左往右,第一个sigmoid, f t f_t ft表示forget gate : f t = σ ( W f x t + U f h t − 1 + b f ) f_t = \sigma (W_fx_t + U_fh_{t-1} + b_f) ft=σ(Wfxt+Ufht1+bf)

  • Wf: input weight for the forget gate
  • Uf: recurrent weight for the forget gate
  • bf: bias weight for the forget gate

从左往右,第二个sigmoid, i t i_t it 表示 update gate: i t = σ ( W i x t + U i h t − 1 + b i ) i_t = \sigma(W_ix_t + U_ih_{t-1} + b_i) it=σ(Wixt+Uiht1+bi)

  • Wi: input weight for the update gate
  • Ui: recurrent weight for the update gate
  • bi: bias weight for the update gate.

从左往右,第一个tanh, c t ~ \tilde{c_t} ct~ 表示 update candidate value.(注意这里是 σ h ) \sigma_h) σh) c t ~ = σ h ( W c x T + U c h t − 1 + b c ) \tilde{c_t} = \sigma_h(W_cx_T + U_ch_{t-1} + b_c) ct~=σh(WcxT+Ucht1+bc)

  • Wc: input weight for the candidate value
  • Uc: recurrent weight for the candidate value
  • bi: bias weight for the candidate value

新的state C t C_t Ct: C t = f t ∗ C t − 1 + i t ∗ C ~ t C_t = f_t * C_{t-1} + i_t * \tilde{C}_t Ct=ftCt1+itC~t
注意这里左右两个乘号分别对应图中两个乘号,加号对应图中的加号

从左往右,第三个sigmoid, o t o_t ot 表示 output gate. o t = σ g ( W o x t + U o h t − 1 + b o ) o_t = \sigma_g(W_ox_t + U_oh_{t-1} + b_o) ot=σg(Woxt+Uoht1+bo)

  • Wo: input weight for the output gate
  • Uo: recurrent weight for the output gate
  • bo: bias weight for the output gate

从左往右,第二个tanh 表示final output
h t = o t ∗ t a n h ( C t ) h_t = o_t * tanh(C_t) ht=ottanh(Ct)

W f W_f Wf: input weight for the forget gate
U f U_f Uf: recurrent weight for the forget gate
b f b_f bf: bias weight for the forget gate
W i W_i Wi: input weight for the update gate
U i U_i Ui: recurrent weight for the update gate
b i b_i bi: bias weight for the update gate.
W c W_c Wc: input weight for the candidate value
U c U_c Uc: recurrent weight for the candidate value
b i b_i bi: bias weight for the candidate value
W o W_o Wo: input weight for the output gate
U o U_o Uo: recurrent weight for the output gate
b o b_o bo: bias weight for the output gate
So there you have it, there are 12 weights that can largely be divided into three camps:
input weight that multiplies with the input timestep
recurrent weight that multiplies with the previous output
bias weight

Q&A

  • tanh 的值是是[-1,1], o t o_t ot的值是[0,1], 那最后 h t h_t ht的output不就是[0,1]?
  • sigmoid是处理forget的概率,tanh是处理值
  • 这里貌似并没有DNN,只有matrix multiplicaiton。一个matmul代表不了DNN因为没有non-linarity

code level of LTSM

原文地址
pre-knowledge: Keras Layer class
Three important functions: __init()__, __build()__ , __call()__
可以在init里面设置weight(w)和bias(b), 推荐在build里面
init会在被创建的时候运行,build会在第一次run的时候运行

def build(self, input_shape):
	input_dim = input_shape[-1]
	...
	self.kernel = self.add_weight(shape=(input_dim, self.units * 4), ...)

这里4*self.units是因为i(update) f(forget) c(update candidate) o(output)各需要一个

def call(self, inputs, states, training=None):
	...
	 h_tm1 = states[0]  # previous memory state
     c_tm1 = states[1]  # previous carry state
	i = self.recurrent_activation(x_i + K.dot(h_tm1_i,
	                                       self.recurrent_kernel_i))
	f = self.recurrent_activation(x_f + K.dot(h_tm1_f,
	                                       self.recurrent_kernel_f))
	c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1_c,
	                                             self.recurrent_kernel_c))
	o = self.recurrent_activation(x_o + K.dot(h_tm1_o,
	                                       self.recurrent_kernel_o)
	...
	h = o * self.activation(c)

这几行分别对应上面几个式子。
注意self.recurrent_activation和self_activation是不同的。一般recurrent_activation是Sigmoid,activation是tanh,不过可以自己设定

GRU

在这里插入图片描述

Update gate:
U t = σ ( W u h t − 1 + U u x t b u ) U_t = \sigma (W_uh_{t-1} + U_ux_t b_u) Ut=σ(Wuht1+Uuxtbu)
Reset gate:
r t = σ ( W r h t − 1 + U t x t + b r ) r_t = \sigma (W_rh_{t-1} + U_tx_t + b_r) rt=σ(Wrht1+Utxt+br)
New hidden state content:
h ~ t = tanh ⁡ ( W h ( r t ⊗ h t − 1 + U h x t + b h ) \tilde h_t = \tanh (W_h(r_t \otimes h_{t-1} + U_hx_t + b_h) h~t=tanh(Wh(rtht1+Uhxt+bh)
Hidden State:
h t = ( 1 − u t ) ⊗ h t − 1 + u t ⊗ h ~ t h_t = (1-u_t) \otimes h_{t-1} + u_t \otimes \tilde h_t ht=(1ut)ht1+uth~t

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron English | 2017 | ISBN: 1491962291 | 566 Pages | EPUB | 8.41 MB Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This practical book shows you how. By using concrete examples, minimal theory, and two production-ready Python frameworks—scikit-learn and TensorFlow—author Aurélien Géron helps you gain an intuitive understanding of the concepts and tools for building intelligent systems. You’ll learn a range of techniques, starting with simple linear regression and progressing to deep neural networks. With exercises in each chapter to help you apply what you’ve learned, all you need is programming experience to get started. Explore the machine learning landscape, particularly neural nets Use scikit-learn to track an example machine-learning project end-to-end Explore several training models, including support vector machines, decision trees, random forests, and ensemble methods Use the TensorFlow library to build and train neural nets Dive into neural net architectures, including convolutional nets, recurrent nets, and deep reinforcement learning Learn techniques for training and scaling deep neural nets Apply practical code examples without acquiring excessive machine learning theory or algorithm details
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值