面试题,https://blog.youkuaiyun.com/qq_23225317/article/details/82467755
https://blog.youkuaiyun.com/comway_Li/article/details/82532573
https://blog.youkuaiyun.com/comway_Li/article/details/82947716
https://blog.youkuaiyun.com/xwd18280820053/article/details/76026523
https://blog.youkuaiyun.com/xwd18280820053/article/details/77529906
1. 过拟合问题,一共有几种解决方法:
a. drouput, inverted dropout 等,https://blog.youkuaiyun.com/program_developer/article/details/80737724,https://yq.aliyun.com/articles/68901,https://blog.youkuaiyun.com/dongapple/article/details/77996500
b. L1和L2正则化,原理,L1是lasso回归,L2是岭回归,L1 regularization可以使参数优化到0,L2较小的值, https://zhuanlan.zhihu.com/p/35356992?utm_medium=social&utm_source=wechat_session
c. batchnormalization, covariate shift, 防止梯度爆炸,https://www.cnblogs.com/bonelee/p/8528722.html,https://blog.youkuaiyun.com/guoyuhaoaaa/article/details/80236500
2. Loss函数, https://www.jianshu.com/p/b715888f079b
a. softmax, softmax loss, cross-entropy loss, https://blog.youkuaiyun.com/u014380165/article/details/77284921
b.softmax 对输入求导,https://blog.youkuaiyun.com/u014380165/article/details/79632950
c. 反向传播,https://www.jianshu.com/p/6908be0c5389
3. 激活函数,为什么要使用激活函数,哪些有点与缺点,https://blog.youkuaiyun.com/kangyi411/article/details/78969642,https://blog.youkuaiyun.com/tyhj_sf/article/details/79932893
4.梯度消失与梯度爆炸,https://blog.youkuaiyun.com/qq_25737169/article/details/78847691
5. optimizer,optimizer怎么选择的?adam和SGD有什么区别SGD全局最优还是局部最优?为什么?https://blog.youkuaiyun.com/g11d111/article/details/76639460
6. Xavier初始化,神经网络初始化的问题,https://blog.youkuaiyun.com/u012328159/article/details/80025785,https://www.cnblogs.com/makefile/p/init-weight.html?utm_source=itdadao&utm_medium=referral
7. 逻辑回归与线性回归的区别,https://blog.youkuaiyun.com/sinat_32329183/article/details/77835677,注意逻辑回归的推导,https://blog.youkuaiyun.com/u014258807/article/details/80616647,https://zhuanlan.zhihu.com/p/35250134
8. SVM和PCA的原理 https://www.jianshu.com/p/d289755e89bb
9.Kmeans和KNN
10. 1*1卷积核的作用,https://blog.youkuaiyun.com/l7H9JA4/article/details/80650259
11. DenseNet 和 ResNet,https://blog.youkuaiyun.com/jiachen0212/article/details/78536018
12. boosting 和 bagging 区别 https://www.cnblogs.com/liuwu265/p/4690486.html
13. GBDT https://blog.youkuaiyun.com/legendavid/article/details/78904353, https://www.cnblogs.com/jiangxinyang/p/9248154.html
14. 随机森林以及oob https://www.cnblogs.com/maybe2030/p/4585705.html
15. 熵,信息增益 http://www.cnblogs.com/fantasy01/p/4581803.html?utm_source=tuicool
https://www.cnblogs.com/pinard/p/6140514.html, http://www.cnblogs.com/pinard/p/6133937.html,https://www.cnblogs.com/peizhe123/p/5086128.html
为什么引入非线性激励函数?
因为如果不用非线性激励函数,每一层都是上一层的线性函数,无论神经网络多少层,输出都是输入的线性组合,与只有一个隐藏层效果一样。相当于多层感知机了。所以引入非线性激励函数,深层网络就变得有意义了,可以逼近任意函数。
什么造成梯度消失?推导?
许多激活函数将输出值挤压在很小的区间内,在激活函数两端较大范围的定义域内梯度为0,导致权重更新的缓慢训练难度增加,造成学习停止。(前面层上的梯度是来自后面的层上项的乘积,当层数过多时,随着乘积的累积,将越来越小。)
梯度消失、梯度爆炸:BP算法基于梯度下降策略,以目标的负梯度方向对参数进行调整,参数的更新为w←w+Δww←w+Δw,给定学习率αα,得出Δw=−α∂Loss∂wΔw=−α∂Loss∂w。如果要更新第二隐藏层的权值信息,根据链式求导法则,更新梯度信息:
Δw1=∂Loss∂w2=∂Loss∂f4∂f4∂f3∂f3∂f2∂f2∂w2Δw1=∂Loss∂w2=∂Loss∂f4∂f4∂f3∂f3∂f2∂f2∂w2,很容易看出来∂f2∂w2=f1∂f2∂w2=f1,即第二隐藏层的输入。
所以说,∂f4∂f3∂f4∂f3就是对激活函数进行求导,如果此部分大于1,那么层数增多的时候,最终的求出的梯度更新将以指数形式增加,即发生梯度爆炸,如果此部分小于1,那么随着层数增多,求出的梯度更新信息将会以指数形式衰减,即发生了梯度消失
optimizer
loss function
回归
初始化
resnet 和 Densenet