使用Pycharm来实现Spark-SQL。
from pyspark import Row
from pyspark.sql import SparkSession
from pyspark.sql.types import StructField, StringType, StructType
if __name__ == "__main__":
spark = SparkSession\
.builder\
.appName("app name")\
.master("local")\
.getOrCreate()
sc = spark.sparkContext
line = sc.textFile("D:\\data\\demo.txt").map(lambda x: x.split('|'))
# personRdd = line.map(lambda p: Row(id=p[0], name=p[1], age=int(p[2])))
# personRdd_tmp = spark.createDataFrame(personRdd)
# personRdd_tmp.show()
#读取数据
schemaString = "id name age"
fields = list(map(lambda fieldName: StructField(fieldName, StringType(), nullable=True), schemaString.split(" ")))
schema = StructType(fields)
rowRDD = line.map(lambda attributes: Row(attributes[0], attributes[1],attributes[2]))
peopleDF = spark.createDataFrame(rowRDD, schema)
peopleDF.createOrReplaceT

这篇博客详细介绍了如何在PyCharm环境下利用Python进行Spark SQL编程,涵盖了设置环境、创建DataFrame、执行SQL查询等关键步骤。
最低0.47元/天 解锁文章
827

被折叠的 条评论
为什么被折叠?



