今天来看一个非常简单的算法:SGD随机梯度下降,说实话它有些不起眼,但是当今AI算法的各个场景都能见到它的身影.应该是众多机器学习算法中最常用的优化方法.几乎当前每一个先进的(state-of-the-art)机器学习库或者深度学习库都会包括梯度下降算法的不同变种实现.
图解
SGD随机梯度下降及其变种的示意图
算法详情
用处是什么?
GD和SGD中,都会在每次迭代中更新模型的参数,使得代价函数变小.
也就是针对某个参数
w
w
最小化某个函数.
代码
用函数指针来实现导数计算:
double derivation(double(*loss_function)(VectorXd*),VectorXd* X)
{
VectorXd X_new;
X_new.array() = X->array() + step;
cout<<"Now the loss is:"<<loss_function(X)<<endl;
return (loss_function(&X_new)-loss_function(X))/step;
}
全部代码:
//#include "csv.hpp"
#include <Eigen/Dense>
#include <iostream>
#include <vector>
//g++ sgd.cpp -o sgd -I/download/eigen
#define MAX_STEPS 20
using namespace std;
using namespace Eigen;
static double W = 3.0;
static double step = 0.02;
static double nita = 0.3;
double loss_function(VectorXd* X)
{
return (W*(*X)).norm();
}
double derivation(double(*loss_function)(VectorXd*),VectorXd* X)
{
VectorXd X_new;
X_new.array() = X->array() + step;
cout<<"Now the loss is:"<<loss_function(X)<<endl;
return (loss_function(&X_new)-loss_function(X))/step;
}
void sgd(double(*loss_function)(VectorXd*))
{
VectorXd X(5);
X.setConstant(1.1);
//cout<<X<<endl;
for (int i = 0; i < MAX_STEPS; ++i)
{
W = W - nita*derivation(loss_function,&X);
//cout<<derivation(loss_function,&X)<<endl;
}
cout<<"After "<<MAX_STEPS<<" steps iteration the W is:"<<endl;
cout<<W<<endl;
}
int main(int argc, char const *argv[])
{
sgd(loss_function);
return 0;
}
编译运行
root@master:/App/优快云_blog/SGD# g++ SGD.cpp -o sgd -I/download/eigen
root@master:/App/优快云_blog/SGD# ./sgd
Now the loss is:7.37902
Now the loss is:2.42902
Now the loss is:0.799585
Now the loss is:0.263207
Now the loss is:0.0866424
Now the loss is:0.0285209
Now the loss is:0.00938851
Now the loss is:0.0030905
Now the loss is:0.00101733
Now the loss is:0.000334885
Now the loss is:0.000110237
Now the loss is:3.62878e-05
Now the loss is:1.19452e-05
Now the loss is:3.93212e-06
Now the loss is:1.29437e-06
Now the loss is:4.26082e-07
Now the loss is:1.40257e-07
Now the loss is:4.61699e-08
Now the loss is:1.51982e-08
Now the loss is:5.00293e-09
After 20 steps iteration the W is:
6.69545e-10