@创建于:2022.06.15
@修改于:2022.06.15
利用Spark的yarn模式(把.py文件上传到hadoop平台),执行过程中发现了下面的问题。py4j.protocol.Py4JJavaError: An error occurred while calling o91.showString.
代码段如下:
df = spark.sql("select * from test.test_table")
print(type(df))
df.printSchema()
df.show(3)
log日志如下:
Traceback (most recent call last):
File "/tmp/spark-0d83b9e6-2801-4a72-9713-5e3fba6f4cbe/main.py", line 44, in <module>
df.show(3)
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 378, in show
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o91.showString.
问题解决:
test.test_table中是个空表,增加四条数据,就可以解决上述问题。
一点感受:
不能把python DataFrame的思想完全套用在Spark DataFrame中。