- 博客(36)
- 资源 (1)
- 收藏
- 关注
原创 maven setting.xml 中国配置
<!-- mirror | Specifies a repository mirror site to use instead of a given repository. The repository that | this mirror serves has an ID that matches the mirrorOf element of this mi
2016-06-09 10:09:00
516
原创 【spark】spark+kafka
:启动kafkaMobaXterm_Personal_8.5.exeD:/Develop/kafka_2.10-0.8.2.1/bin/windows/zookeeper-server-start.bat D:/Develop/kafka_2.10-0.8.2.1/config/zookeeper.propertiesD:/Develop/kafka_2.10-0.8.2.
2016-05-12 22:16:30
527
原创 【python】numpy,scipy,pandas资源列表
http://blog.youkuaiyun.com/huangxia73/article/details/38065881
2016-04-05 23:28:46
485
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 12. K-Means Clustering
:spark examples中的kmeans实现/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional
2016-04-05 19:43:29
438
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 13 k-Nearest Neighbors
:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport breeze.linalg.DenseVector/** * This class solves K-Nearest-Nerigbor
2016-03-31 19:00:45
445
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 11 Smarter Email Marketing wit
:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport org.apache.spark.Partitionerimport org.apache.spark.HashPartitioneri
2016-03-30 20:10:32
403
原创 【spark source】Spark LinearRegression源码解读
:org.apache.spark.mllib.regression.RegressionModel定义线性回归模型的predict接口:org.apache.spark.mllib.regression.impl.GLMRegressionModel从文件中加载Model,或保存Model到文件中:org.apache.spark.mllib.pmml.PMMLExportabl
2016-03-28 21:42:20
738
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 10 Content-Based Recommend
:scala版本算法package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContext/** * usermovieratings.txt * * User1 Movie1 1 * User1 Movie2 2 * User
2016-03-28 21:13:39
342
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 9 Recommendation People
:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBuffer/** * friends.txt * 1 2,3,4,5,
2016-03-23 19:52:44
467
原创 【spark+nlp】 Feature Extract and Preprocess
:Spark NLP常用方法package com.bbw5.ml.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport org.apache.spark.ml.feature.CountVectorizerimport org.apache.spark.ml.feature.Co
2016-03-22 22:20:57
1065
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 9 Recommendation Items
:scala版算法实现 package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.HashMapimport scala.collection.mutable.Arra
2016-03-22 18:19:21
435
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 8 Common Friends
:scala 版算法package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.HashMap/** * The FindCommonFriends is a Spa
2016-03-21 20:49:30
370
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 7 Market Basket Analysis
:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBuffer/** * finds all association rul
2016-03-17 18:22:28
369
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter6 MovingAverage
:scala版算法package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBufferimport org.apache.spark.Partitioner
2016-03-16 22:10:28
487
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter5 Order Inversion Pattern
:scala版本算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBufferimport org.apache.spark.SparkCon
2016-03-16 20:00:16
651
原创 【storm kafka】storm kafka集成
:maven 配置,解决log4j和slf4j的冲突<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.a
2016-03-10 23:12:16
598
原创 【storm】win7-64位 storm安装
:安装文档http://storm.apache.org/documentation/Setting-up-a-Storm-cluster.html:启动zk(复用kafka zk)cd G:\Big-File\Architecture\storm\kafka_2.10-0.9.0.0bin\windows\zookeeper-server-start config
2016-03-10 18:52:58
1542
原创 【kafka】win7-64位 kafka安装
:quick starthttp://kafka.apache.org/documentation.html#quickstart:修改kafka zk配置config/zookeeper.propertiesdataDir=G:/Big-File/Architecture/storm/kafka_2.10-0.9.0.0/zookeeper:启动zkcd G:\Big
2016-03-10 18:18:15
2655
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter4 LeftOuterJoin
:scala版package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContext/** * This class provides a basic implementation of "left outer join" * operat
2016-03-09 23:58:31
339
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter3 Top 10 NonUniqueList
:package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport java.util.PriorityQueue/** * Assumption: for all input (K, V), K's are non-un
2016-03-09 23:30:03
822
原创 【spark】采用MultilayerPerceptron对MNIST的0-9数字进行识别
:由于只采用一种(28 * 28, 100, 50, 10)层进行训练,效果不是很好package com.bbw5.ml.sparkimport org.apache.spark.ml.tuning.ParamGridBuilderimport org.apache.spark.SparkContextimport org.apache.spark.sql.SQLContext
2016-03-09 22:07:41
1743
原创 【spark】采用LogisticRegression(ML API篇)对MNIST的0-1数字进行识别
:ROC曲线概念http://blog.youkuaiyun.com/abcjennifer/article/details/7359370:Recall-Precision概念http://blog.youkuaiyun.com/pirage/article/details/9851339:下载MNIST数据集http://yann.lecun.com/exdb/mnist/:加载M
2016-03-09 19:34:36
2061
原创 【spark-breeze】win7-64位 breeze安装
breeze:maven 依赖 org.scalanlp breeze_2.10 0.11.2org.scalanlpbreeze-natives_2.10 0.11.2---------------------------------------------------------------------------------------
2016-03-08 19:02:25
1897
原创 【python】win7-64位安装python
:下载pythonhttps://www.python.org/ftp/python/2.7.11/python-2.7.11.amd64.msi:添加以下路径进入PathD:\Develop\Python27;D:\Develop\Python27\Scripts:下载Microsoft Visual C++ Compiler for Python 2.7(安装其后,可以直接
2016-03-03 20:08:35
1309
原创 【spark+python】采用LogisticRegression(MLLib)对MNIST的0-1数字进行识别
:下载数据集http://yann.lecun.com/exdb/mnist/:
2016-02-29 20:33:39
1487
翻译 【Mastering Machine Learning with scikit-learn (python+spark版)】Chapter2 Linear Regression
:源码下载地址https://www.packtpub.com/big-data-and-business-intelligence/mastering-machine-文章管理learning-scikit-learn:启动ipython notebookcd E:\DM\bookcode\mastering-machine-learning-scikit-learnip
2016-02-24 22:02:55
1126
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter3 Top 10 List
:scala版本的Top 10 Listpackage com.bmb.dataalgorithms.sparkimport scala.collection.mutable.PriorityQueueimport org.apache.spark.Loggingimport org.apache.spark.SparkConfimport org.apache.spark.Spa
2016-02-24 21:56:40
1032
翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter1 Secondary Sort
:最近看了《Data Algorithms_Recipes for Scaling up with Hadoop and Spark》,其中的算法采用Java实现,下载路径为源码下载https://github.com/mahmoudparsian/data-algorithms-book/:本着学习的目的,现提供scala版本的算法Secondary Sortpackage com.
2016-02-23 22:22:03
573
原创 【spark】spark word count例子
:代码package com.test.mllib.testimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextobject WorkCountApp { def main(args: Array[String]) { var filename = "" args match
2016-02-17 20:50:36
829
原创 【spark】spark常用命令列表
:启动spark-shell时,指定需要加载的类库bin\spark-shell --jars E:\DM\code\projects\ch11-testit\target\ch11-testit-1.0.0.jar:通过spark-submit运行某个应用E:\DM\Spark\spark-1.4.1-bin-hadoop2.4\bin\spark-submit --maste
2016-02-17 19:35:28
5810
原创 【spark】创建一个基于maven的spark项目所需要的pom.xml文件模板
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">4.0.0com.xxxxtestjartestit1.0.0nexusOS Chinahttp://maven.oschina.net/conte
2016-02-17 19:32:59
5500
原创 【spark】win7-64位下编译spark1.6.0
1:设置setting.xml中maven仓库为http://maven.oschina.net/content/groups/public/ (此仓库需要maven3.3.3以上)xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/S
2016-02-17 19:17:42
445
原创 【hadoop】win7-32位下安装hadoop2.x
:安装JDK1.7:下载hadoop-2.3.0http://archive.apache.org/dist/hadoop/core/hadoop-2.3.0/hadoop-2.3.0.tar.gz:下载hadoop-common-2.2.0-bin-32.rarhttps://codeload.github.com/srccodes/hadoop-common-2.2.0-b
2016-02-16 23:15:54
1981
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人