
ML in coding
文章平均质量分 67
mmc2015
北大信科学院,关注深度强化学习。http://net.pku.edu.cn/~maohangyu/
展开
-
machine learning in coding(python):拼接原始数据;生成高次特征
拼接原始数据:train_data = pd.read_csv('train.csv')test_data = pd.read_csv('test.csv')all_data = np.vstack((train_data.ix[:,1:-1], test_data.ix[:,1:-1]))numpy下的合并数组vstack和hstack函数:>>> a = np.ones((原创 2015-08-10 21:33:02 · 1445 阅读 · 0 评论 -
深入FFM原理与实践
美团团队的不少文章都不错http://tech.meituan.com/deep-understanding-of-ffm-principles-and-practices.html自己实现了一把,用python,速度慢于台大的libffm,但效果更好。几个注意点:1)label与loss function之间的关系。。。2)梯度计算时的 中间数据 预计原创 2016-06-25 23:26:36 · 15744 阅读 · 10 评论 -
深入RandomFroest
随机森林体现在随机上,台湾林老师讲了三种随机方式:1)样本bootstrap2)特征sample3)特征交叉组合看了sklearn的代码,实现了前两者,但没有第三种看了karpathy的代码(https://github.com/karpathy/Random-Forest-Matlab),没有bootstrap样本,feature组合部分写的也真是太“”随机“原创 2016-07-10 17:46:02 · 965 阅读 · 1 评论 -
Complete Guide to Parameter Tuning in XGBoost (with codes in Python)
http://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/IntroductionIf things don’t go your way in predictive modeling, use XGboost.转载 2016-03-30 22:06:21 · 3569 阅读 · 0 评论 -
machine learning in coding(python):polynomial curve fitting,python拟合多项式
下面给出一个拟合多项式的例子:[python] view plaincopyimport pandas as pd import numpy as np import scipy as sp import matplotlib.pyplot as plt from sklearn.pipeline原创 2015-09-07 10:54:53 · 1949 阅读 · 0 评论 -
Kaggle 机器学习竞赛冠军及优胜者的源代码汇总
http://dataunion.org/14892.htmlKaggle比赛源代码和讨论的收集整理。Algorithmic Trading Challenge40Solution whitepaper41.Solution thread30.Allstate Purchase Prediction Challenge7Rank 2 so转载 2015-08-06 19:56:11 · 10048 阅读 · 0 评论 -
machine learning in coding(python):使用xgboost构建预测模型
接上篇:def xgboost_pred(train,labels,test): params = {} params["objective"] = "reg:linear" params["eta"] = 0.005 params["min_child_weight"] = 6 params["subsample"] = 0.7 params["colsample原创 2015-08-05 22:06:37 · 6994 阅读 · 5 评论 -
machine learning in coding(python):pandas数据包DataFrame数据结构简介
导入模块:import pandas as pdimport numpy as np #pandas依赖于numpyfrom sklearn import preprocessingimport xgboost as xgb常用功能简介:#load train and test train = pd.read_csv('train.csv', index_co原创 2015-08-05 22:02:39 · 3145 阅读 · 2 评论 -
machine learning in coding(python):根据关键字合并多个表(构建组合feature)
三张表;train_set.csv;test_set.csv;feature.csv。三张表通过object_id关联。import pandas as pdimport numpy as np# load training and test datasetstrain = pd.read_csv('../input/train_set.csv')test = pd.re原创 2015-08-02 17:14:38 · 2339 阅读 · 0 评论 -
scikit-learn:在实际项目中用到过的知识点(总结)
零、所有项目通用的:http://blog.youkuaiyun.com/mmc2015/article/details/46851245(数据集格式和预测器)http://blog.youkuaiyun.com/mmc2015/article/details/46852755(加载自己的原始数据)(适合文本分类问题的 整个语料库加载)http://blog.youkuaiyun.com/mmc2原创 2015-07-27 08:34:35 · 7156 阅读 · 4 评论 -
machine learning in coding(python):使用贪心搜索【进行特征选择】
print "Performing greedy feature selection..."score_hist = []N = 10good_features = set([])# Greedy feature selection loopwhile len(score_hist) score_hist[-2][0]: scores = [] for f in ran原创 2015-08-11 20:32:06 · 1540 阅读 · 0 评论 -
machine learning in coding(python):使用交叉验证【选择模型超参数】
# Hyperparameter selection loopscore_hist = []Cvals = [0.001, 0.003, 0.006, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.1]for C in Cvals: model.C = C score = cv_loop(Xt, y, model, N) score_hi原创 2015-08-11 20:45:18 · 1938 阅读 · 0 评论 -
Time Series Forecasting with the Long Short-Term Memory Network in Python
http://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/by Jason Brownlee on April 7, 2017 in Deep Learning0000The L转载 2017-04-09 10:10:48 · 2983 阅读 · 0 评论