找到json文件
/usr/local/spark/examples/src/main/resources/people.json
编写应用程序jsontext.py
frompyspark import SparkContext
importjson
sc=SparkContext('local','JSONAPP')
inputFile="file:///usr/local/spark/examples/src/main/resources/people.json"
jsonStrs=sc.textFile(inputFile)
result=jsonStrs.map(lambdas : json.loads(s))
result.foreach(print)
spark-submmit运行显示结果
root@master:~/pysparkfile#spark-submit --master yarn /root/pysparkfile/jsontext.py
SLF4J:Class path contains multiple SLF4J bindings.
SLF4J:Found binding in[jar:file:/usr/local/spark/jars/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J:Found binding in [jar:file:/usr/local/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J:See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J:Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/04/2212:41:07 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attemptingport 4041.
18/04/2212:41:07 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attemptingport 4042.
18/04/2212:41:07 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attemptingport 4043.
18/04/2212:41:07 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attemptingport 4044.
18/04/2212:41:07 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attemptingport 4045.
18/04/2212:41:07 WARN Utils: Service 'SparkUI' could not bind on port 4045. Attemptingport 4046.
18/04/2212:41:07 WARN Utils: Service 'SparkUI' could not bind on port 4046. Attemptingport 4047.
{'name':'Michael'}
{'age':30, 'name': 'Andy'}
{'age':19, 'name': 'Justin'}
root@master:~/pysparkfile#
本文展示了如何在Spark中使用Python读取JSON文件。通过创建SparkContext,指定输入文件路径,然后使用map函数配合json.loads进行解析,最后通过foreach打印结果。在运行示例代码时,虽然遇到了SparkUI端口占用的问题,但最终成功输出了JSON数据。
781

被折叠的 条评论
为什么被折叠?



