1、去官网下载spark,如果找不到你要的版本,随便点击下载一个。把地址复制出来,在浏览器地址栏中粘贴。去掉后面的具体位置,打开,进入父目录,找一个你要的下载。如果没有对应的hadoop版本,比如说2.5.。就要下载without for hadoop,然后自己编译吧。
2、机器上要安装java,hadoop,scala,如果要编译,还需要安装maven。具体方法,解压,修改配置文件,可以百度。
3、修改scala目录下conf\spark_evn.sh. SPARK_DIST_CLASSPATH=$(hadoop classpath)。是小括号不是大括号。
maven编译
[root@hadoop1 conf]# export MAVEN_OPTS="-Xmx4g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
[root@hadoop1 conf]# mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.1 -DskipTests clean package
4、完成。(可惜我的报错了)
[WARNING] The requested profile "yarn" could not be activated because it does not exist. [WARNING] The requested profile "hadoop-2.5" could not be activated because it does not exist. [ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/root/spark-1.6.1-bin-without-hadoop/conf). Please verify you invoked Maven from the correct directory. -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MissingProjectException [root@hadoop1 conf]#
5、找了半天好像不用编译,直接去spark/bin/spark-shell下运行就可以
报错的可能是需要源码编译。或者直接启动shell有问题。下次再想把。