pyspark 写入MySQL报错 An error occurred while calling o45.jdbc.: scala.MatchError: null 解决方案

最新推荐文章于 2024-11-04 23:09:05 发布

原创最新推荐文章于 2024-11-04 23:09:05 发布 · 9.6k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#pyspark #MySQL #python

Python 同时被 3 个专栏收录

84 篇文章

订阅专栏

mysql

64 篇文章

订阅专栏

Spark

44 篇文章

订阅专栏

本文介绍了一个关于使用PySpark连接MySQL并写入数据时遇到的错误及其解决方案。错误出现在尝试将Spark DataFrame写入MySQL数据库时，并给出了详细的错误日志及正确的代码实现。

当我尝试使用pySpark连接MySQL，将简单的spark dataframe写入MySQL数据时报错，

py4j.protocol.Py4JJavaError: An error occurred while calling o45.jdbc.: scala.MatchError: null 错误解决方案

（1）错误提示：

Fri Jul 13 16:22:56 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Traceback (most recent call last):
  File "/Users/a6/Downloads/speiyou_di/hive/log_task/111.py", line 47, in <module>
    df1.write.mode("append").jdbc(url="jdbc:mysql://localhost:3306/spark_db?user=root&password=yyz!123456", table="test_person", properties={"driver": 'com.mysql.jdbc.Driver'})
  File "/Library/Python/2.7/site-packages/pyspark/sql/readwriter.py", line 765, in jdbc
    self._jwrite.mode(mode).jdbc(url, table, jprop)
  File "/Library/Python/2.7/site-packages/py4j-0.10.6-py2.7.egg/py4j/java_gateway.py", line 1160, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Library/Python/2.7/site-packages/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/Library/Python/2.7/site-packages/py4j-0.10.6-py2.7.egg/py4j/protocol.py", line 320, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o45.jdbc.
: scala.MatchError: null
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:63)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
	at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

（2）出错代码：

# !/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
# 设置spark_home
import os
os.environ["SPARK_HOME"] = "/Users/a6/Applications/spark-2.1.0-bin-hadoop2.6"

from pyspark.sql import SQLContext
from pyspark import SparkContext
sc = SparkContext(appName="pyspark mysql demo")
sqlContext = SQLContext(sc)

# 创建连接获取数据

# 本地测试
dataframe_mysql=sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/spark_db").option("dbtable", "test_person").option("user", "root").option("password", "yyz!123456").load()

# 输出数据
print "\nstep1 、dataframe_mysql.collect()\n",dataframe_mysql.collect()
dataframe_mysql.registerTempTable("temp_table")
print dataframe_mysql.show()
print dataframe_mysql.count()

print "step 2、 准备待写入的数据"

from pyspark.sql.types import *

# user defined schema for json file.
schema = StructType([StructField("name", StringType()), StructField("age", IntegerType())])

# loading the contents of the json to the data frame with the user defined schema for json data.
d = [{'name': 'Alice1', 'age': 1}, {'name': 'tome1', 'age': 20}]
df1 = sqlContext.createDataFrame(d, schema)

# display the contents of the dataframe.
print df1.show()

# display the schema of the dataframe.
print df1.printSchema()

print "step3、写入数据"

# 本地测试
#  出错代码A
df1.write.mode("append").jdbc(url="jdbc:mysql://localhost:3306/spark_db?user=root&password=yyz!123456", table="test_person", properties={"driver": 'com.mysql.jdbc.Driver'})

# 正确代码B
#df1.write.jdbc(mode="overwrite", url="jdbc:mysql://localhost:3306/spark_db?user=root&password=yyz!123456", table="test_person", properties={"driver": 'com.mysql.jdbc.Driver'})

print "step4、写入成功，读取验证数据"
df1.show()

# 本地测试
dataframe_mysql=sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/spark_db").option("dbtable", "test_person").option("user", "root").option("password", "yyz!123456").load()

# 输出数据
print "dataframe_mysql.collect()\n",dataframe_mysql.collect()

print "step 5、 所有执行成功"

（3）解决方案

将【出错代码A】换成【正确代码B】，即可执行成功。比较可知，我们只是轻微做了调整。

（4）错误场景还原需要

首先，需要在本地创建数据库spark_db，同时创建test_person数据，具体如下：

create database spark_db;

CREATE TABLE `test_person` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) DEFAULT NULL,
  `age` int(3) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

insert into test_person(name,age) values('yyz',18);

参考：https://stackoverflow.com/questions/49391933/pyspark-jdbc-write-error-an-error-occurred-while-calling-o43-jdbc-scala-matc