几个月前折腾了一番jupyter(web 代码编辑器),感觉jupyter在编写某些科学文章是比较灵活.不过几乎是单机版的,在生产环境应用有限.之前因为需要在集群上执行编辑的代码,遂查找资料,又折腾了一番,使得jupyter可以应用于集群环境.
具体的,比如我可以写点简单的程序,然后在集群上面执行代码. 程序是python语言的.所以,集群是pyspark集群.
在spark包下面,会有pyspark 的jar包, 在/examples/src/main下面,也会有一些pyspark的示例代码.
下面介绍一下安装教程.
因为安装步骤较多,细节较多,本人可能会有遗漏. 而且,jupyter的这种应用,之前很少有人尝试;也采用了一些较新的组件,有需要的可以结合其它资料,交叉参考.
(1)安装jupyter.
链接:
http://blog.youkuaiyun.com/cafebar123/article/details/78636826
(2)上面的链接里是jupyter的安装教程,安装完,主界面会有python和pyspark编辑器, 这里使用pyspark. python可以直接用代码片测试通过,包括使用%matplotlib inline 等命令行. 可是pyspark无法直接使用 magic 等命令行,需要继续安装额外的组件: sparkMagic.
sparkmagic安装:
pip install sparkmagic
检查是否安装成功:
jupyter nbextension enable --py --sys-prefix widgetsnbextension
找到sparkmagic的安装位置:
pip show sparkmagic
比如我的是:/usr/local/lib/python2.7/site-packages
进入该目录,找到sparkmagic
cd sparkmagic
ll
cd kernels
ll
可以看到
pyspark3kernel
pysparkkernel
sparkkernel
sparkrkernel
wrapperkernel
回到sparkmagic目录,依次安装这些kernel:
jupyter-kernelspec install sparkmagic/kernels/sparkkernel
jupyter-kernelspec install sparkmagic/kernels/pysparkkernel
jupyter-kernelspec install sparkmagic/kernels/pyspark3kernel
jupyter-kernelspec install sparkmagic/kernels/sparkrkernel
编辑config.json文件:
sudo vi /home/infosouth/.sparkmagic/config.json
原来是空文件,添加:
{ "kernel_python_credentials" : { "username": "", "password": "", "url": "http://Master:8990", "auth": "None" }, "kernel_scala_credentials" : { "username": "", "password": "", "url": "http://Master:8990", "auth": "None" }, "kernel_r_credentials": { "username": "", "password": "", "url": "http://Master:8990", }, "logging_config": { "version": 1, "formatters": { "magicsFormatter": { "format": "%(asctime)s\t%(levelname)s\t%(message)s", "datefmt": "" } }, "handlers": { "magicsHandler": { "class": "hdijupyterutils.filehandler.MagicsFileHandler", "formatter": "magicsFormatter", "home_path": "~/.sparkmagic" } }, "loggers": { "magicsLogger": { "handlers": ["magicsHandler"], "level": "DEBUG", "propagate": 0 } } }, "wait_for_idle_timeout_seconds": 15, "livy_session_startup_timeout_seconds": 60, "fatal_error_suggestion": "The code failed because of a fatal error:\n\t{}.\n\nSome things to try:\na) Make sure Spark has enough available resources for Jupyter to create a Spark context.\nb) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.\nc) Restart the kernel.", "ignore_ssl_errors": false, "session_configs": { "driverMemory": "1000M", "executorCores": 2 }, "use_auto_viz": true, "coerce_dataframe": true, "max_results_sql": 2500, "pyspark_dataframe_encoding": "utf-8", "heartbeat_refresh_seconds": 30, "livy_server_heartbeat_timeout_seconds": 0, "heartbeat_retry_seconds": 10, "server_extension_default_kernel_name": "pysparkkernel", "custom_headers": {}, "retry_policy": "configurable", "retry_seconds_to_sleep_list": [0.2, 0.5, 1, 3, 5], "configurable_retry_policy_max_retries": 8 }
config.json 里面的Master 需要在host文件里面配置. Master同时也是主节点的别名.
设置sparkmagic拓展可用:
jupyter serverextension enable --py sparkmagic
安装完,pyspark编辑器试下命令:
lsmagic
试下出现的命令是否可用.
pyspark 测试:
from pyspark import SparkContext sc = SparkContext()
sqlContext = SQLContext(sc)