坑0:好的,你要玩spark了,然后你去官网下一个spark准备爽一把,不过官网特闷给了7种spark让你下载,下哪个好捏?
<option value="sources"> Source Code [can build several Hadoop versions] </option>
<option value="without-hadoop"> Pre-build with user-provided Hadoop [can use with most Hadoop distributions] </option>
<option value="hadoop2.6"> Pre-built for Hadoop 2.6 and later </option>
<option value="hadoop2.4"> Pre-built for Hadoop 2.4 and later </option>
<option value="hadoop2.3"> Pre-built for Hadoop 2.3 </option>
<option value="hadoop1"> Pre-built for Hadoop 1.X </option>
<option value="cdh4"> Pre-built for CDH 4 </option>
我推荐Pre-built for Hadoop 2.6这个,其他自己愿意试就去试试吧.我下载without-hadoop这个版本弄了三天都没搞好.
坑1:官方给的elasticsearch-hadoop包里,有三个包含spark的jar包,其实他们都是骗子,为了兼容spark1.2 1.3之类的用的,真正有用的只有一个包:elasticsearch-hadoop-X.X.X.jar.并且你如果加载了不对的jar他也不给你提示,还会给你报莫名其妙的错误.艹!
坑2:新手用spark-shell测试的时候,他这个命令行会自作聪明帮你创建一个sc变量,而网上的教程都不是按照spark-shell这种交互的方式给你的代码,因此执行下面的语句会造成根本无效:
val conf = new SparkConf().setAppName("test").setMaster("local")
conf.set("es.index.auto.create", "true")
conf.set("es.nodes", "10.70.5.26")
conf.set("es.port","9280")
而下面的语句会直接报错:
val sc = new SparkContext(conf)
这简直就是虐啊!正确的方式是啥呢?请看:
./spark-shell \
--jars /home/songrunpeng/elasticsearch-hadoop-2.1.3/dist/elasticsearch-hadoop-2.1.3.jar \
--conf spark.es.nodes=10.70.5.26 \
--conf spark.es.port=9280 \
--conf spark.es.nodes.discovery=false \
--conf spark.es.http.timeout=5m
import org.apache.spark.SparkConf
import org.elasticsearch.spark._
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("OTP" -> "Otopeni", "SFO" -> "San Fran")
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")
以上代码是往es里存一个叫spark的index,存好后你可以去看看它的数据结构,OK暂时写这么多,跟es能通就可以按照别的教程走了吧.
<option value="sources"> Source Code [can build several Hadoop versions] </option>
<option value="without-hadoop"> Pre-build with user-provided Hadoop [can use with most Hadoop distributions] </option>
<option value="hadoop2.6"> Pre-built for Hadoop 2.6 and later </option>
<option value="hadoop2.4"> Pre-built for Hadoop 2.4 and later </option>
<option value="hadoop2.3"> Pre-built for Hadoop 2.3 </option>
<option value="hadoop1"> Pre-built for Hadoop 1.X </option>
<option value="cdh4"> Pre-built for CDH 4 </option>
我推荐Pre-built for Hadoop 2.6这个,其他自己愿意试就去试试吧.我下载without-hadoop这个版本弄了三天都没搞好.
坑1:官方给的elasticsearch-hadoop包里,有三个包含spark的jar包,其实他们都是骗子,为了兼容spark1.2 1.3之类的用的,真正有用的只有一个包:elasticsearch-hadoop-X.X.X.jar.并且你如果加载了不对的jar他也不给你提示,还会给你报莫名其妙的错误.艹!
坑2:新手用spark-shell测试的时候,他这个命令行会自作聪明帮你创建一个sc变量,而网上的教程都不是按照spark-shell这种交互的方式给你的代码,因此执行下面的语句会造成根本无效:
val conf = new SparkConf().setAppName("test").setMaster("local")
conf.set("es.index.auto.create", "true")
conf.set("es.nodes", "10.70.5.26")
conf.set("es.port","9280")
而下面的语句会直接报错:
val sc = new SparkContext(conf)
这简直就是虐啊!正确的方式是啥呢?请看:
./spark-shell \
--jars /home/songrunpeng/elasticsearch-hadoop-2.1.3/dist/elasticsearch-hadoop-2.1.3.jar \
--conf spark.es.nodes=10.70.5.26 \
--conf spark.es.port=9280 \
--conf spark.es.nodes.discovery=false \
--conf spark.es.http.timeout=5m
import org.apache.spark.SparkConf
import org.elasticsearch.spark._
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("OTP" -> "Otopeni", "SFO" -> "San Fran")
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")
以上代码是往es里存一个叫spark的index,存好后你可以去看看它的数据结构,OK暂时写这么多,跟es能通就可以按照别的教程走了吧.