1、当运行python出现TypeError: sequence item 1: expected string or Unicode, int found,如下错误信息:
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/src/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
process()
File "/usr/local/src/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/usr/local/src/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 133, in dump_stream
for obj in iterator:
File "/usr/local/src/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 1494, in func
File "/sunxj/spark/pyspark/wordcount_1.py", line 18, in <lambda>
result = rdd.flatMap(f).map(lambda word:(word,1)).reduceByKey(lambda a,b:a+b).sortBy(lambda x:x[1],ascending=False).map(lambda x:'\t'.join([x[0],x[1]]))
TypeError: sequence item 1: expected string or Unicode, int found
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
如下图所示:

2、出现此错误表示在输出时,不能以int方式进行join,如下代码x[1]是一个数字:

3、只需要将x[1]通过str转换成字符串即可,如下图所示:
![]()
本文解决在使用Python进行Spark编程时遇到的TypeError问题,详细解析错误原因,并提供代码修改方案,确保输出时正确处理数据类型。
31万+

被折叠的 条评论
为什么被折叠?



