如何使用scikits-learn
- 使用easy_install或pip安装scikits-learn
pip install -U scikit-learn
easy_install -U scikit-learn
* 简单计算样例
from sklearn import datasets
boston_prices = datasets.load_boston()
print "Data shape", boston_prices.data.shape
print "Data max = %s min = %s" %(boston_prices.data.max(), boston_prices.data.min())
print "Target max = %s min = %s" %(boston_prices.target.max(), boston_prices.target.min())
* 简单聚类分析
1. 下载股票数据
start = datetime.datetime(2011, 01, 01)
end = datetime.datetime(2012, 01, 01)
quotes = [finance.quotes_historical_yahoo_ochl('^GSPC', start, end, asobject=True, adjusted=True) for symbol in symbols]
close = numpy.array([q.close for q in quotes]).astype(numpy.float)
print close.shape
2. 计算亲和度矩阵
logreturns = numpy.diff(numpy.log(close))
print logreturns.shape
logreturns_norms = numpy.sum(logreturns ** 2, axis = 1)
S = -logreturns_norms[:, numpy.newaxis]-logreturns_norms[numpy.newaxis,:]+2*numpy.dot(logreturns, logreturns.T)
3. 亲和传播聚类
aff_pro = sklearn.cluster.AffinityPropagation().fit(S)
labels = aff_pro.labels_
for i in xrange(len(labels)):
print "%s in Cluster %d" % (symbols[i],labels[i])
什么是scikits-learn
scikits-learn项目提供了机器学习相关的API。sckits-learn项目中包含了若干数据集和范例图像,可以用来做一些实验。
聚类(clustering)代表一类机器学习算法,用来基于相似度对研究对象分组。