rdd = sc.parallelize([("Sam", 28, 88), ("Flora", 28, 90), ("Run", 1, 60)])
df = rdd.toDF(["name", "age", "score"])
df.show()
sc.stop()
我想使用RDD来创建SparkDataFrame,但是报错了
解决方案:增加三行代码,如下
from pyspark.sql.session import SparkSession
sc=SparkContext()
SparkSession(sc) #利用SparkSession来使sc具有处理PipelinedRDD的能力
rdd = sc.parallelize([("Sam", 28, 88), ("Flora", 28, 90), ("Run", 1, 60)])
df = rdd.toDF(["name", "age", "score"])
df.show()
sc.stop()
如果报了下面的错误,使用sc.stop()运行一下下就可以了 ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=test_SamShare, master=local[4]) created by