It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
如果你的pyspark里面有两层map,且内层map使用了sc(SparkContext),就会报这个错误,原因是说sc只能在你的main函数里面使用,不能在map并发中调用 (SparkContext can only be used on the driver, not in code that it run on workers)
所以需要将两层map改为一层map,或者不在内层使用sc,
在最外层将所有数据压扁处理flatmap or join