Databricks 加载MySQL数据

最新推荐文章于 2025-04-11 07:45:00 发布

原创最新推荐文章于 2025-04-11 07:45:00 发布 · 592 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#mysql #数据库 #spark

本文介绍如何在Databricks环境中加载MySQL数据库的数据，并演示了数据处理过程，包括创建临时视图、修改列名、表连接及排序等操作。

databricks加载MySQL数据，需要先安装MySQL的驱动包。可以从集群的Libraries安装。上传jar包即可。

查询如下

driver = "com.mysql.jdbc.Driver"
url = dbutils.secrets.get(scope = "db_test", key = "url")
user = dbutils.secrets.get(scope = "db_test", key = "username")
password = dbutils.secrets.get(scope = "db_test", key = "password")
obj = spark.read.format("jdbc").option("driver", driver).\
option("url", url).option("user", user).option("password", password)

table_sql = "(select * from students) tmp"
df_students = obj.option("dbtable", table_sql).load()
display(df_students)

创建临时表

df_students.createOrReplaceTempView("test_student")

修改列名

df_students_new = df_students.withColumnRenamed("name","student_name" )

表连接

#全连接
df_union1 = df_students_new.join(df_students,"id","full")
#左外连接
df_union2 = df_students_new.join(df_students,"id","left_outer")
#右外连接
df_union3 = df_students_new.join(df_students,"id","right_outer")

排序