
spark
南若安好
简述
展开
-
spark使用parallelize方法创建RDD
通过调用SparkContext的parallelize方法,在一个已经存在的Scala集合上创建的(一个Seq对象)。集合的对象将会被拷贝,创建出一个可以被并行操作的分布式数据集。[python] view plaincopydata = [1, 2, 3, 4, 5] distData = sc.parallelize(data)转载 2015-11-09 14:54:02 · 53128 阅读 · 0 评论 -
spark连接mysql数据库(python语言)
1、将mysql-connector-java-5.1.22-bin.jar文件放入/opt/spark/lib/文件夹内(放到别的文件夹也可)2、修改spark-env.sh文件加入:export SPARK_CLASSPATH=/opt/spark/lib/mysql-connector-java-5.1.22-bin.jar:$SPARK_CLASSPATH3、spark程序原创 2015-12-09 21:22:41 · 7473 阅读 · 1 评论 -
Frequently Asked Questions in Spark
Frequently Asked QuestionsUsing PredictionIOQ: How do I check to see if various dependencies, such as ElasticSearch and HBase, are running?You can run $ pio status from the termina转载 2015-11-17 19:32:28 · 922 阅读 · 0 评论