Spark Machine Learning 总览

Spark ML库算法总览

最新推荐文章于 2020-09-09 14:26:06 发布

原创最新推荐文章于 2020-09-09 14:26:06 发布 · 899 阅读

CC 4.0 BY-SA版权

文章标签：

1 篇文章

订阅专栏

本文概览了Spark ML库中的主要数据统计与挖掘算法，包括分类与回归算法如SVM、Logistic回归等；聚类算法如k-means、Gaussian mixture等；降维算法如SVD、PCA等；以及特征提取与转换方法。

Spark的ML(Machine Learning)库提供了主流数据统计/挖掘算法的实现，威廉将在本文中做一个总览，具体的解析将会在之后的文章中来写

算法	Spark算法类	Spark模型类
SVM支持向量机	SVMWithSGD	SVMModel
Logistic回归	LogisticRegressionWithLBFGS；LogisticRegressionWithSGD	LogisticRegressionModel
线性回归	LinearRegressionWithSGD	LinearRegressionModel
实时线性回归	StreamingLinearRegressionWithSGD	LinearRegressionModel
岭回归	RidgeRegressionWithSGD	RidgeRegressionModel
Lasso回归	LassoWithSGD	LassoModel
朴素贝叶斯	NaiveBayes	NaiveBayesModel
决策树	DecisionTree	DecisionTreeModel
随机森林	RandomForest	RandomForestModel
Gradient-Boosted Trees	GradientBoostedTrees	GradientBoostedTreesModel
Isotonic regression	IsotonicRegression	IsotonicRegressionModel

算法	Spark算法类	Spark模型类
alternating least squares (ALS)	ALS	MatrixFactorizationModel

算法	Spark算法类	Spark模型类
k-means	KMeans	KMeansModel
Gaussian mixture	GaussianMixture	GaussianMixtureModel
power iteration clustering (PIC)	PowerIterationClustering	PowerIterationClusteringModel
latent Dirichlet allocation (LDA)	LDA	DistributedLDAModel
streaming k-means	StreamingKMeans	KMeansModel