问题1:
root 用户运行pyspark 没问题
hadoop 用户运行就报如下错
df1 = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("collection","devices").load()
17/02/15 19:34:26 WARN MongoInferSchema: Field 'devcaps' contains conflicting types converting to StringType
17/02/15 19:34:27 ERROR PoolWatchThread: Error in trying to obtain a connection. Retrying in 7000ms
java.sql.SQLException: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.setReadOnly(Unknown Source)
at com.jolbox.bonecp.ConnectionHandle.setReadOnly(ConnectionHandle.java:1324)
求解
。。。
以hadoop用户 运行spark-submit 就不报错,原因可能是pyspark 的bug
root 用户运行pyspark 没问题
hadoop 用户运行就报如下错
df1 = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("collection","devices").load()
17/02/15 19:34:26 WARN MongoInferSchema: Field 'devcaps' contains conflicting types converting to StringType
17/02/15 19:34:27 ERROR PoolWatchThread: Error in trying to obtain a connection. Retrying in 7000ms
java.sql.SQLException: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.setReadOnly(Unknown Source)
at com.jolbox.bonecp.ConnectionHandle.setReadOnly(ConnectionHandle.java:1324)
求解
。。。
以hadoop用户 运行spark-submit 就不报错,原因可能是pyspark 的bug
本文探讨了Hadoop用户在使用PySpark读取MongoDB数据时遇到的问题,详细记录了一个关于数据类型冲突和数据库连接只读模式的错误,并尝试提供了解决方案。
3314

被折叠的 条评论
为什么被折叠?



