[Spark]Django项目使用Spark(thrift-server)

使用pyhs2简化Pyspark数据库操作体验
本文详细介绍了使用pyhs2驱动简化Pyspark数据库操作的过程,包括启动thrift-server、连接测试、Python脚本连接与查询方法,以及在Django中的应用示例。通过对比Pyspark,展示pyhs2提供的便捷性与一致性。

上午时弄了半天的pyspark,没有想到还有thrift-server和pyhs2这样的神器。

使用过程比pyspark的使用更为简单,用了之后类似于使用数据库的感觉。(connect,cursor,execute这些方法及使用完全和用psycopg2连接postgresql的感觉一致),当然,这只是我目前粗浅的尝试的感觉。


1.首先得启动thrift-server,命令大概类似于:

cd /usr/local/spark/sbin

sudo ./start-thriftserver.sh --maset local


2.连接成功之后,就可以用spark/bin目录下的beeline进行jdbc连接测试了。如下图所示:



3.那么py脚本如何进行连接和查询呢?

它需要通过pyhs2 driver,可以理解为一个“数据库连接的驱动”。

下载地址为:https://github.com/BradRuderman/pyhs2

害怕github以后也会被墙,我决定等会在优快云上把它上传一下。


安装pyhs2和安装所有python第三方库一样,解压缩之后,直接运行python setup.py install ,它就会被安装到python安装目录的site-package下。

值得注意的是,它需要依赖sasl包。所以在安装它之前,需要执行一下apt-get install libsasl2-dev。


4.接下来就可以愉快的使用了,下面的截图是在python命令行中的执行:


可以看到,所有的方法,和连接普通的数据库的方法,几乎完全一致。使用过程中感觉不到任何区别。


5.点题一下,在django中的使用。

a.python安装了新的第三方库,所以需要在eclipse中进行一下添加,要不然写代码时就会报错。

window->preference->PyDev->Interpreters->Python Interpreter->New Egg/zips

将新装的pyhs2加上,如截图所示:


b.然后我们开始写读表的代码:


c.运行结果如下图所示:



C:\Users\小陈\Desktop\VenvA\venv\Scripts\python.exe E:\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\manage.py runserver 8000 Performing system checks... Watching for file changes with StatReloader Exception in thread django-main-thread: Traceback (most recent call last): File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\django\core\checks\urls.py", line 136, in check_custom_error_handlers handler = resolver.resolve_error_handler(status_code) File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\django\urls\resolvers.py", line 732, in resolve_error_handler callback = getattr(self.urlconf_module, "handler%s" % view_type, None) File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\django\utils\functional.py", line 47, in __get__ res = instance.__dict__[self.name] = self.func(instance) File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\django\urls\resolvers.py", line 711, in urlconf_module return import_module(self.urlconf_name) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\importlib\__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "E:\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\urls.py", line 20, in <module> import myApp.views File "E:\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\myApp\views.py", line 5, in <module> from utils.query import * File "E:\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\utils\query.py", line 1, in <module> from pyhive import hive File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\pyhive\hive.py", line 18, in <module> from TCLIService import TCLIService File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\TCLIService\TCLIService.py", line 9, in <module> from thrift.Thrift import TType, TMessageType, TFrozenDict, TException, TApplicationException ImportError: cannot import name 'TFrozenDict' from 'thrift.Thrift' (C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\thrift\Thrift.py) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\threading.py", line 1016, in _bootstrap_inner self.run() File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\django\utils\autoreload.py", line 64, in wrapper fn(*args, **kwargs) File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\django\core\management\commands\runserver.py", line 134, in inner_run self.check(**check_kwargs) File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\django\core\management\base.py", line 492, in check all_issues = checks.run_checks( File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\django\core\checks\registry.py", line 89, in run_checks new_errors = check(app_configs=app_configs, databases=databases) File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\django\core\checks\urls.py", line 138, in check_custom_error_handlers path = getattr(resolver.urlconf_module, "handler%s" % status_code) File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\django\utils\functional.py", line 47, in __get__ res = instance.__dict__[self.name] = self.func(instance) File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\django\urls\resolvers.py", line 711, in urlconf_module return import_module(self.urlconf_name) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\importlib\__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "E:\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\urls.py", line 20, in <module> import myApp.views File "E:\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\myApp\views.py", line 5, in <module> from utils.query import * File "E:\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\基于spark的空气质量数据分析可视化系统\utils\query.py", line 1, in <module> from pyhive import hive File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\pyhive\hive.py", line 18, in <module> from TCLIService import TCLIService File "C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\TCLIService\TCLIService.py", line 9, in <module> from thrift.Thrift import TType, TMessageType, TFrozenDict, TException, TApplicationException ImportError: cannot import name 'TFrozenDict' from 'thrift.Thrift' (C:\Users\小陈\Desktop\VenvA\venv\lib\site-packages\thrift\Thrift.py)
06-15
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值