推荐使用 Anaconda.它是一个预编译的科学Python套件。(或者Enthougt)
下载 Anaconda-2.3.0-Linux-x86_64.sh 并copy到
/usr/local/apps
目录下,运行bash Anaconda-2.3.0-Linux-x86_64.sh
完成安装。更新环境变量:在
/home/hadoop/.bashrc
中添加export PATH=/usr/local/apps/anaconda/bin:$PATH
如果使用IPython3.0 ;使用如下方式更新
conda update conda
conda update ipython ipython-notebook ipython-qtconsole然后安装或者更新jupyter(IPython3.x以上):
conda install jupyter
或者conda update jupyter
将集群中的机器做同样的配置;启动hadoop,spark;用如下命令启动pyspark终端:
IPYTHON=1 IPYTHON_OPTS="--pylab" ./bin/pyspark
这样在启动的pyspark可以一起启用IPython和pylab。
显示如下,启动成功IPython 4.0.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. Using matplotlib backend: Qt4Agg
遇到问题:
使用xshell远程连接主机,同样命令会报cannot connect to X server
错误,暂时没有解决。