py4j.protocol.Py4JJavaError: An error occurred while calling o39.colStats.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/modules/spark-2.2.0/python/lib/pyspark.zip/pyspark/worker.py", line 166, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/opt/modules/spark-2.2.0/python/lib/pyspark.zip/pyspark/worker.py", line 55, in read_command
command = serializer._read_with_length(file)
File "/opt/modules/spark-2.2.0/python/lib/pyspark.zip/pyspark/serializers.py", line 169, in _read_with_length
return self.loads(obj)
File "/opt/modules/spark-2.2.0/python/lib/pyspark.zip/pyspark/serializers.py", line 451, in loads
return pickle.loads(obj, encoding=encoding)
File "/opt/modules/spark-2.2.0/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 784, in _make_skel_func
closure = _reconstruct_closure(closures) if closures else None
File "/opt/modules/spark-2.2.0/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 776, in _reconstruct_closure
return tuple([_make_cell(v) for v in values])
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/modules/spark-2.2.0/python/lib/pyspark.zip/pyspark/worker.py", line 166, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/opt/modules/spark-2.2.0/python/lib/pyspark.zip/pyspark/worker.py", line 55, in read_command
command = serializer._read_with_length(file)
File "/opt/modules/spark-2.2.0/python/lib/pyspark.zip/pyspark/serializers.py", line 169, in _read_with_length
return self.loads(obj)
File "/opt/modules/spark-2.2.0/python/lib/pyspark.zip/pyspark/serializers.py", line 451, in loads
return pickle.loads(obj, encoding=encoding)
File "/opt/modules/spark-2.2.0/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 784, in _make_skel_func
closure = _reconstruct_closure(closures) if closures else None
File "/opt/modules/spark-2.2.0/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 776, in _reconstruct_closure
return tuple([_make_cell(v) for v in values])
TypeError: 'int' object is not iterable
--------------------------------------------
问题描述:
本地用终端连接pyspark shell窗口下能正常运行代码,写成spark应用程序集群运行导致上述报错;
解决:
可能是安装pyspark包时,默认安装了最新的版本,而pyspark终端启动默认使用的是pyspark里面的,而应用程序则是