Pyspark连接数据库

最新推荐文章于 2025-05-16 11:09:18 发布

lc_1123

最新推荐文章于 2025-05-16 11:09:18 发布

阅读量6.3k

点赞数 1

CC 4.0 BY-SA版权

分类专栏： Spark学习文章标签： spark

本文链接：https://blog.youkuaiyun.com/lc_1123/article/details/70185843

Spark学习专栏收录该内容

10 篇文章

订阅专栏

本文介绍了如何利用Pyspark 1.6.2 API连接到MySQL数据库，特别是DataFrameWriter.jdbc方法的使用，该方法接受URL、table和properties作为参数，其中properties是包含连接详情的映射。文章探讨了在连接过程中采用的方法，但未明确说明是线程池还是普通连接方式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

pyspark 1.6.2 API http://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html?highlight=jdbc#pyspark.sql.DataFrameWriter.jdbc

1. 数据库以Mysql为例

url = “jdbc:mysql://localhost:3306/test”
table = “test”
properties = {"user":"root","password":"111111"}

df = sqlContext.read.jdbc(url,table,properties) #读
df.write.jdbc(url,table,properties) #写 
# 写入时候需要把RDD转为dataframe类型 可用rdd.toDF()


# 如果导入数据有中文
# mysql的表的编码格式设置成utf8 ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
# url="jdbc:mysql://127.0.0.1:3306/goddness?useUnicoding=true&characterEncoding=utf-8"
# 带汉字的字段前加u 如：u'汉字'

2. jdbc函数如下，主要传入url table 和properties等三个参数，properties是map类型

def jdbc(self, url, table, mode=None, properties=None):
    """Saves the content of the :class:`DataFrame` to a external database table via JDBC.

    .. note:: Don't create too many partitions in parallel on a large cluster;\
    otherwise Spark might crash your external database systems.

    :param url: a JDBC URL of the form ``jdbc:subprotocol:subname``
    :param table: Name of the table in the external database.
    :param mode: specifies the behavior of the save operation when data already exists.

        * ``append``: Append contents of this :class:`DataFrame` to existing data.
        * ``overwrite``: Overwrite existing data.
        * ``ignore``: Silently ignore this operation if data already exists.
        * ``error`` (default case): Throw an exception if data already exists.
    :param properties: JDBC database connection arguments, a list of
                       arbitrary string tag/value. Normally at least a
                       "user" and "password" property should be included."""

3. 问题：

连接数据库时用的是什么方法，线程池？还是普通的连接？？？