可以看到不支持两个dataframe 流join
把其换为开窗函数,仍然报错,dataframe 流不支持开窗函数
Non-time-based windows are not supported on streaming DataFrames/Datasets;
故写方法进行计算,但是该方法不是全局变量,仍存在一定的局限性
# 第一个参数是这一批数据,第二个参数是批次号
def getMinScore(df,batch_id):
# 此处的df 是一个 静态的数据 DataFrame
# df.createOrReplaceTempView("temp_answer2")
# spark.sql("""
# select * from temp_answer2
# """).show()
df.show()
print(df.isStreaming)
import pyspark.sql.functions as F
#minDf = df.groupBy("student_id").min("score")
minDf = df.groupBy("student_id").agg(F.min("score").alias("min_score"))
minDf.createOrReplaceTempView("student_min_score")
minDf.show()
resultDf = df.join(minDf,"student_id")
resultDf.show()
#resultDf.createOrReplaceTempView("result_answer")
resultDf.where("score=min_score").select("student_id","question_id","min_score").show()
answerDf.writeStream.foreachBatch(getMinScore).outputMode("append").start().awaitTermination()