Spark报错：The pivot column feature has more than 10000 distinct values

小白白白又白cdllp

于 2020-07-27 18:26:47 发布

阅读量1.4k

点赞数 1

分类专栏：大数据文章标签： spark

本文链接：https://blog.youkuaiyun.com/weixin_39750084/article/details/107618042

版权

大数据专栏收录该内容

16 篇文章

订阅专栏

当使用Pyspark进行窄表转宽表操作时，遇到超过10000个不同值的错误提示。通过调整spark.sql.pivotMaxValues参数至30000，成功解决了问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

（作者：陈玓玏 data-master)

用pyspark做窄表转宽表的时候，出现报错：

pyspark.sql.utils.AnalysisException: 
u'The pivot column feature has more than 10000 distinct values, 
this could indicate an error. 
If this was intended, 
set spark.sql.pivotMaxValues to at least 
the number of distinct values of the pivot column.;'

在这里插入图片描述