土肥宅娘口三三-优快云博客

原创 machine learning博客索引

本系列为台大林轩田老师《机器学习基石》和《机器学习技法》课程的部分学习笔记。机器学习基础机器学习笔记-Nonlinear Transformation机器学习笔记-Hazard of Overfitting机器学习笔记-Regularization机器学习笔记-Validation机器学习笔记-线性回归机器学习笔记-Logistic回归机器学习笔记-利用线性模型进行分类SV...

2018-06-16 08:21:17 1085

原创 deep learning博客索引

Course1 Week2-foundation of neural network Week3-one hidden layer neural network Week4-deep neural network Course2 Week1-setting up your ML application ...

2018-06-15 22:49:04 592

原创 XGBoost 3 - XGBoost原理及调用

XGBoost原理Boosting AdaBoostGradient BoostingXGBoost1 - BoostingBoosting: 将弱学习器组合成强分类构造一个性能很高的强学习器是一件很困难的事情但构造一个性能一般的弱学习器并不难弱学习器：性能比随机猜测好（层数不深的CART是一个好选择）G(x)=∑t=1Tαtϕt(x)G(x)=∑t=1Tα...

2018-07-03 14:30:30 1571 2

原创 XGBoost 2 - 机器学习基础

2 - 机器学习基础监督学习分类回归树随机森林2.1 - 监督学习模型参数目标函数损失函数正则项优化2.1.1 - 模型若y为离散值，则为分类问题；若y为连续值，则为回归问题。对于给定的x如何预测标签y^y^\hat{y}? - 对于回归问题中的线性回归，其模型为：y^=f(x)=∑jwjxjy^=f(x)=∑jwjxj\hat{y} = f...

2018-07-03 14:25:02 679

原创 XGBoost 1 - 基础及简单调用

XGBoostextreme gradient boosting, 是gradient boosting machine的优化实现，快速有效。xgboost简介 xgboost特点xgboost基本使用指南xgboost理论基础 supervise learningCARTboostinggradient boostingxgboostxgboost实战特征工程参...

2018-07-03 14:23:56 3887

原创 Course4-week4-face recognition and neural style transfer

1 - what is face recognition?This week will show you a couple important special applications of CONVnet, we will start with face recognition and then go on to neural style transfer. Verification:...

2018-06-11 22:19:17 625

原创 Course4-week3-object detection

1 - object localizationIn order to build up the object detection, we first learn about object localization. image classification: the algorithm look at the picture and responsible for saying ...

2018-06-11 22:04:20 764

原创 Course4-week2-case studies

case studies1 - why look at cases studies?how the together the basic building block, such as CONV layer, POOL layer, FC layer, to form effective convolutinal neural network?outline:classic ...

2018-06-09 11:05:39 520

原创 Course4-week1-convolutional neural network

1 - Computer visionComputer vision problem:images recognitionobject detectionstyle transfer One of the challenges of the computer vision problem is that input can get really big. Th...

2018-06-09 10:52:44 863

原创 Course3 - machine learning strategy 2

1 - carrying out error analysisIf the learning algorithm is not yet at the performance of a human, then manually examiming mistakes that the algorithm is making can give us a insight into what to do...

2018-06-08 17:56:03 892

原创 Course3 - machine learning strategy 1

introduction to ML strategy1 - why ML strategy?How to structure machine learning project, that is on the machine learning strategy. What is machine learning strategy, let’s say we are working ...

2018-06-08 17:44:16 610

原创 Course2-week3-hyperparameterTuning - BatchNormalization - Framework

hyperparameter tuning1 - tuning processHow to systematically organize hyperparameters tuning process?hyperparameterslearning rate αα\alphaββ\beta in momentum, or set the default 0.9mini-b...

2018-06-08 16:45:36 578

原创 Course2-week2-optimization algorithm

optimization algorithms1 - mini-batch gradient descentvectorization allows you to efficiently compute on m examples.But if m is large then it can be very slow. With the implement of graident des...

2018-06-08 16:37:43 626

原创 Course2-week1-setting up your ML application

setting up your ML application1 - train/dev/test setThis week we’ll learn the partical aspects of how to make your neural network work well, ranging from things like hyperparameters tuning to ho...

2018-06-08 16:20:37 623

原创 Course1-week4-deep neural network

4.1 - deep L-layer neural networkWe have seen forward propagation and backward propagation in the context of a neural network with a single hidden layer as well we the logistic regression, and we le...

2018-06-08 15:58:23 417

原创 Course1-week3-one hidden layer neural network

3.1 - neural networks overviewSome new notation have been introduce, we’ll use superscript square bracket 1 to refer to the layer of neural network, for instance, w[1]w[1]w^{[1]} representing the pa...

2018-06-08 15:48:48 547

原创 Course1-week2-foundation of neural network

Week 2Basics of Neural Network Programming2.1 binary classificationmmm training example: (x(1),y(1)),⋯,(x(m),y(m))(x(1),y(1)),⋯,(x(m),y(m))(x^{(1)}, y^{(1)}), \cdots, (x^{(m)}, y^{(m)}) X=⎡...

2018-06-08 15:37:03 411

原创 keras学习简易笔记

Keras:基于Python的深度学习库Keras是一个高层神经网络API，Keras由纯Python编写而成并基Tensorflow、Theano以及CNTK后端。Keras适用的Python版本是：Python 2.7-3.6。1 - 一些基本概念1.1 - 符号计算Keras的底层库使用Theano或TensorFlow，这两个库也称为Keras的后端。无论是Thea...

2018-05-06 15:43:29 2559 1

原创 lambda函数的用法简记

lambda函数lambda是一个匿名函数，其语法为：lambda parameters:express一般用法import numpy as npsigmoid = lambda x:1./(1.+np.exp(-x))sigmoid(np.array([-10, 0, 10]))array([ 4.53978687e-05, 5.00000000e-01, 9.9

2018-01-27 11:04:02 1723

原创关于plt.cm.Spectral

cmap = plt.cm.Spectral用法理解%matplotlib inlineimport numpy as npimport matplotlib.pyplot as pltnp.random.seed(1) # 产生相同的随机数X = np.random.randn(1, 10)Y = np.random.randn(1, 10)label = np.arra

2018-01-27 10:46:21 20922 3

原创决策树ID3算法及实现

0. 信息论信道模型和信息的含义信息论是关于信息的本质和传输规律的理论。信道模型：信源（发送端）-> 信道 -> 信宿（接收端） 1. 通信过程是在随机干扰的环境汇中传递信息的过程 2. 信宿对于信源的先验不确定性：在通信前，信宿不能确切的了解信源的状态； 3. 信宿对于信源的后验不确定性：在通信后，由于存在干扰，信宿对于接收到的信息仍然具有不确定性 4. 后验不确定性总是

2018-01-12 21:32:27 14106

原创机器学习笔记-Validation

可以使用regularization来避免overfitting的发生。监督机器学习问题可以概括为：在规则化参数的同时最小化误差。最小化误差是为了让我们的模型拟合我们的训练数据，而规则化参数是防止我们的模型过分拟合我们的训练数据。Regularization的具体做法是我们不只是专注在最小化EinE_{in}上，而是在EinE_{in}上加上一个regularizer，将得到的augmented

2018-01-03 19:49:40 7470

原创机器学习笔记-Regularization

Regularized Hypothesis Set上一篇中说到，在机器学习中最大的危险是过拟合。当使用的模型的复杂度过高，资料量不多，资料存在噪声或者是目标函数很复杂的时候都有可能会出现过拟合的情况。Regularization可以看成是对付overfitting的一个方法。右图是一个典型的overfitting的情形，资料量大小为5，当我们使用一个4次甚至是更高次的多项式去

2018-01-03 19:46:23 6777

原创机器学习笔记-Hazard of Overfitting

什么是过拟合将线性模型加上非线性的转换就可以很方便的产生非线性的模型来完成我们的学习任务。但是这样做的缺点是要付出额外的模型复杂度代价。正是这个额外的模型复杂度会造成机器学习中一个很容易出现和很难解决的问题就是过拟合的问题，本小节先分析过拟合产生的原因，然后给出解决的方法。例子以上是一个一维的回归分析的例子。一共有5个资料点，x随机产生，y是将x带入一个二次多项式然后再加

2018-01-03 19:41:28 1350

原创机器学习笔记-Nonlinear Transformation

本系列共四篇，为林轩田机器学习基础篇学习笔记。线性模型通过非线性的变换可以得到非线性的模型，增强了模型对数据的认识能力，但这样导致了在机器学习领域中一个很常见的问题，过拟合。为了解决这个问题引入了规则化因子。为了解决规则化因子的选择，模型的选择，参数的选择等问题引入了validationvalidation的相关方法。机器学习笔记-Nonlinear TransformationNonlinear

2018-01-03 19:37:57 7282

原创机器学习笔记-Matrix Factorization

Matrix FactorizationLinear Network Hypothesis上一篇介绍了RBF NetworkRBF\ Network，简单来说这个模型可以看成是由到很多不同的中心点的相似性的线性组合，其中使用聚类算法kk-MeansMeans来计算中心点。机器学习算法的目的是希望能从资料datadata中学习到某种能力skillskill。例如一个经典的场景是，从用

2017-12-22 16:26:23 7989

原创机器学习笔记-Radial Basis Function Network

Radial Basis Function Network先从一个之前介绍过的模型Gassian SVMGassian\ SVM 说起，简单的来说这个模型就是在SVMSVM中加入了高斯核函数，从而可以做到在无限维度的空间中找最大分隔超平面。该模型最终得到的分类器如下： gsvm(x)=sign(∑SVαnynexp(−γ∥x−xn∥2)+b)(1)g_{svm}(x) = sign\bigg

2017-12-20 20:49:43 11858

原创机器学习笔记-Deep Learning

在上一篇的介绍中我们看到在Neural networkNeural\ network中有一层一层的神经元，它们的作用就是帮助我们识别出资料中的模式patternpattern，将这些模式当成是特征。通过BPBP算法可以帮助我们计算梯度，这样就可以利用GDGD类算法来更新每一个权重，最终得到神经网络中每一个神经元的权重 w(l)ijw_{ij}^{(l)}。所以神经网路的核心就是这些一层一层的神经元

2017-12-18 15:36:09 7262

原创机器学习笔记-Neural Network

Neural NetworkMotivation从我们熟悉的perceptron说起， perceptron就是从

2017-12-09 20:41:38 7339

原创 sklearn之SVM二分类

理论部分线性支持向量机对偶形式支持向量机核函数支持向量机软间隔支持向量机Kernel Logistic RegressionSupport Vector Regression(SVR) 使用sklearn实现的不同核函数的SVM使用不同核函数的SVMSVM用于二分类问题并可视化分类结果。# -*- coding: utf-8 -*-import numpy as npimport

2017-12-04 08:37:24 15603 5

原创机器学习笔记-Gradient Boosted Decision Tree

上一篇介绍了Random Forest，该算法利用Bagging中的bootstrapping机制得到不同的Decision Tree，然后将这些Decision Tree融合起来。除了基本的Bagging和Decision Tree之外，Random Forest还在Decision Tree中加入了更多的randomness。有了这些机制之后，我们发现这个算法可以利用OOB数据做self-V

2017-11-29 09:47:37 8783

原创机器学习笔记-Random Forest

随机森林算法回顾Bagging和Decision Tree这篇主要讲述机器学习中的随机森林算法相关的知识。首先回顾一下我们在前几篇博文中提到的两个模型，Bagging和Decision Tree。 Bagging算法的主要过程是通过bootstraping的机制从原始的资料D\mathcal{D}中得到不同的大小为N′N'资料D~t\mathcal{\tilde{D}_t}，将这些资料

2017-11-29 09:43:33 7405

原创机器学习笔记-Decision Tree

上一篇讲解了Adaptive Boosting算法，这个算法有两个特点：第一个是在第t轮中通过调整每一个样本点的权重值以使得在t+1t+1轮得到不同于的gtg_t的gt+1g_{t+1}；第二点是通过gtg_t的表现计算一个值作为权重将其线性的融合到GG中。这样的算法被证明当base learnerbase\ learner不怎么强的时候，通过这样的方式也可以得到很强的效果。Decision

2017-11-29 09:39:36 7448

原创机器学习笔记-Support Vector Regression(SVR)

Support Vector Regression（SVR）上一篇中的内容是KLR（kernel logistic regression）KLR（kernel\ logistic\ regression）。这个问题的出发点是我们想要把SVMSVM这个强大的工具用在soft binary classificationsoft\ binary\ classification上，我们有两种选择：第一种方

2017-11-26 20:22:31 12086

原创机器学习笔记-Adaptive Boosting

Motivation of Boosting识别苹果通过以下20个样本，其中前十个是苹果，后十个不是苹果，老师想要教会小孩子们如何识别苹果。老师：Michael，前十张图片是苹果，下面的十张不是，通过观测，你觉得苹果长什么样子呢？ Michael：我觉得苹果是圆的如果根据Michael所说的规则，所有的小孩子们都会觉得圆形的就是苹果，在这种简单的规则下有一些

2017-11-24 09:20:41 7081

原创机器学习笔记-Blending and Bagging

为什么要用aggregation如果我们已经有了一些模型hypothesishypothesis，或者已经有了一些featurefeature，这些hypothesishypothesis可以帮助我们做预测，我们怎么样将这些已有的hypothesishypothesis或者是有有预测性的featurefeature结合起来，让它们在一起的时候可以work的更好。这样的模型我们称之为aggreg

2017-11-23 20:01:27 7114 1

原创机器学习笔记-Kernel Logistic Regression

Kernel Logistic Regression本篇要介绍的是将Logistic Regression和Kernel函数结合在一起的应用。即我们要讨论的是：如果想要把KernelKernel的技巧使用在logistic Regressionlogistic\ Regression上，我们应该怎么做？Soft-Margin SVM as Regularized Model回顾

2017-11-16 21:15:36 4921 1

原创 logistic regression识别真假币

介绍本篇实现了使用logisticlogistic回归进行真假币的判断，有关logistic regressionlogistic\ regression的详细讲解见这里。本篇使用随机梯度下降算法(SGD)(SGD)来求解logistic regressionlogistic\ regression，使用的数据集为钞票数据集。该数据集有17321732个样本，每一个样本有44个特征。yy为00表示

2017-10-31 18:54:24 1859

原创 mongodb导入json文件

mongodb导入json格式的文件的命令是mongoimport：在下面的这个例子中，使用mongoimport命令将文件contacts.json中的内容导入user数据库的contacts的数据表中。mongoimport --db users --collection contacts --file contacts.json切换到bin目录下，启动服务./mongod --dbpath /

2017-10-26 14:24:58 43310 3

原创 word2vec安装使用笔记

word2vec 入门基础基本概念word2vec是Google在2013年开源的一个工具，核心思想是将词表征映射为对应的实数向量。目前采用的模型有以下两种CBOW(Continuous Bag-Of-Words，即连续的词袋模型)Skip-Gram项目链接：https://code.google.com/archive/p/word2vec背景知识词向量词向量就是用来将语言中的词进行数学化

2017-09-26 20:35:07 2764

python3.5.2-amd64.exe

setuptools-0.6c11.win32-py2.7.exe

scipy-0.15.1-win32-superpack-python2.7.exe

python-dateutil-2.2.win-amd64-py2.7.exe

pyparsing-2.0.3.win-amd64-py2.7.exe

numpy-MKL-1.8.0.win-amd64-py2.7.exe

matplotlib-1.3.1.win-amd64-py2.7.exe

结巴分词包软件

pymongolinux安装包

PSCC破解方案

flink并行度问题