mahout 命令与类相应的映射文件

最新推荐文章于 2024-10-10 12:36:37 发布
原创最新推荐文章于 2024-10-10 12:36:37 发布 · 919 阅读
0 ·
CC 4.0 BY-SA版权
文章标签：
#mahoutm #机器学
本文介绍了Apache Mahout机器学习库中的各种实用工具和算法，涵盖了从数据预处理到推荐系统的一系列功能。其中包括向量化工具、数学运算、聚类、频繁项集挖掘、分类、隐马尔科夫模型等算法。
摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >
在mahout中是通过MahoutDriver来运行我们自己编写的和它自带的程序的main函数,以下是driver.classes.props文件
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper
org.apache.mahout.utils.vectors.lucene.Driver = lucene.vector : Generate Vectors from a Lucene index
org.apache.mahout.utils.vectors.arff.Driver = arff.vector : Generate Vectors from an ARFF file or directory
org.apache.mahout.utils.vectors.RowIdJob = rowid : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
org.apache.mahout.utils.SplitInput = split : Split Input data into test and train sets
org.apache.mahout.utils.MatrixDumper = matrixdump : Dump matrix in CSV format
org.apache.mahout.utils.regex.RegexConverterDriver = regexconverter : Convert text files on a per line basis based on regular expressions
org.apache.mahout.text.SequenceFilesFromDirectory = seqdirectory : Generate sequence files (of Text) from a directory
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles = seq2sparse: Sparse Vector generation from Text sequence files
org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFiles = seq2encoded: Encoded Sparse Vector generation from Text sequence files
org.apache.mahout.text.WikipediaToSequenceFile = seqwiki : Wikipedia xml dump to sequence file

#Math
org.apache.mahout.math.hadoop.TransposeJob = transpose : Take the transpose of a matrix
org.apache.mahout.math.hadoop.MatrixMultiplicationJob = matrixmult : Take the product of two matrices
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver = svd : Lanczos Singular Value Decomposition
org.apache.mahout.math.hadoop.decomposer.EigenVerificationJob = cleansvd : Cleanup and verification of SVD output
org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob = rowsimilarity : Compute the pairwise similarities of the rows of a matrix
org.apache.mahout.math.hadoop.similarity.VectorDistanceSimilarityJob =  vecdist : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors
org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli = ssvd : Stochastic SVD
#Clustering
org.apache.mahout.clustering.kmeans.KMeansDriver = kmeans : K-means clustering
org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver = fkmeans : Fuzzy K-means clustering
org.apache.mahout.clustering.minhash.MinHashDriver = minhash : Run Minhash clustering
org.apache.mahout.clustering.lda.LDADriver = lda : Latent Dirchlet Allocation
org.apache.mahout.clustering.lda.LDAPrintTopics = ldatopics : LDA Print Topics
org.apache.mahout.clustering.lda.cvb.CVB0Driver = cvb : LDA via Collapsed Variation Bayes (0th deriv. approx)
org.apache.mahout.clustering.lda.cvb.InMemoryCollapsedVariationalBayes0 = cvb0_local : LDA via Collapsed Variation Bayes, in memory locally.
org.apache.mahout.clustering.dirichlet.DirichletDriver = dirichlet : Dirichlet Clustering
org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver = meanshift : Mean Shift clustering
org.apache.mahout.clustering.canopy.CanopyDriver = canopy : Canopy clustering
org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver = eigencuts : Eigencuts spectral clustering
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver = spectralkmeans : Spectral k-means clustering
org.apache.mahout.clustering.topdown.postprocessor.ClusterOutputPostProcessorDriver = clusterpp : Groups Clustering Output In Clusters
#Freq. Itemset Mining
org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver = fpg : Frequent Pattern Growth
#Classification
#old bayes
org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups = prepare20newsgroups : Reformat 20 newsgroups data
org.apache.mahout.classifier.bayes.WikipediaXmlSplitter = wikipediaXMLSplitter : Reads wikipedia data and creates ch
org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver = wikipediaDataSetCreator : Splits data set of wikipedia wrt feature like country
org.apache.mahout.classifier.bayes.TestClassifier = testclassifier : Test the text based Bayes Classifier
org.apache.mahout.classifier.bayes.TrainClassifier = trainclassifier : Train the text based Bayes Classifier
#new bayes
org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob = trainnb : Train the Vector-based Bayes classifier
org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver = testnb : Test the Vector-based Bayes classifier
#SGD
org.apache.mahout.classifier.sgd.TrainLogistic = trainlogistic : Train a logistic regression using stochastic gradient descent
org.apache.mahout.classifier.sgd.RunLogistic = runlogistic : Run a logistic regression model against CSV data
org.apache.mahout.classifier.sgd.PrintResourceOrFile = cat : Print a file or resource as the logistic regression models would see it
org.apache.mahout.classifier.sgd.TrainAdaptiveLogistic = trainAdaptiveLogistic : Train an AdaptivelogisticRegression model
org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic = validateAdaptiveLogistic : Validate an AdaptivelogisticRegression model against hold-out data set
org.apache.mahout.classifier.sgd.RunAdaptiveLogistic = runAdaptiveLogistic : Score new production data using a probably trained and validated AdaptivelogisticRegression model
#HMM
org.apache.mahout.classifier.sequencelearning.hmm.BaumWelchTrainer = baumwelch : Baum-Welch algorithm for unsupervised HMM training
org.apache.mahout.classifier.sequencelearning.hmm.ViterbiEvaluator = viterbi : Viterbi decoding of hidden states from given output states sequence
org.apache.mahout.classifier.sequencelearning.hmm.RandomSequenceGenerator = hmmpredict : Generate random sequence of observations by given HMM
#Classifier Utils
org.apache.mahout.classifier.ConfusionMatrixDumper = cmdump : Dump confusion matrix in HTML or text formats

#Recommenders
org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter = splitDataset : split a rating dataset into training and probe parts
org.apache.mahout.cf.taste.hadoop.als.FactorizationEvaluator = evaluateFactorization : compute RMSE and MAE of a rating matrix factorization against probes
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob = itemsimilarity : Compute the item-item-similarities for item-based collaborative filtering
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob = recommenditembased : Compute recommendations using item-based collaborative filtering
org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob = parallelALS : ALS-WR factorization of a rating matrix
org.apache.mahout.cf.taste.hadoop.als.RecommenderJob = recommendfactorized : Compute recommendations using the factorization of a rating matrix

#Link Analysis
org.apache.mahout.graph.linkanalysis.PageRankJob = pagerank : compute the PageRank of a graph
org.apache.mahout.graph.linkanalysis.RandomWalkWithRestartJob = randomwalkwithrestart : compute all other vertices' proximity to a source vertex in a graph