Spark2.X之scala JDBCDataSource

本文介绍如何使用 Apache Spark SQL 从 MySQL 数据库读取学生基本信息和成绩数据,并通过 join 和 filter 操作筛选出成绩大于等于80分的学生,最终将这些学生的信息存入新的表中。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

/**

1.获取MySQL数据库中学生基本信息表与学生成绩表

2.通过join操作与filter操作将学生分数大于等于80分的学生存入好学生的表中。

**/




import org.apache.spark.sql.SparkSession
import org.apache.commons.collections.map.HashedMap
import java.util.Properties
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType,StructField,StringType};
import org.apache.spark.sql.types.IntegerType
/**
 * @author Administrator
 */
object  JDBCDataSource {
  
  def main(args: Array[String]): Unit = {
    
    val spark = SparkSession.
      builder().
      appName("JDBCDataSource").
      master("local").
      getOrCreate() 
      val connectionProperties = new Properties()
      connectionProperties.put("user", "root")
      connectionProperties.put("password", "root")
      val student_infos = spark.read.jdbc("jdbc:mysql://20.0.20.203/testdb", "student_infos", connectionProperties)
      val student_scores = spark.read.jdbc("jdbc:mysql://20.0.20.203/testdb", "student_scores", connectionProperties) 
      val studentsRDD = student_infos.join(student_scores,"name").filter("score >=80").rdd
      val structType = StructType(Array(
        StructField("name", StringType, true),
        StructField("score", IntegerType, true),
        StructField("age", IntegerType, true)))
     val studentsDF = spark.createDataFrame(studentsRDD, structType)
     studentsDF.write.jdbc("jdbc:mysql://20.0.20.203/testdb", "good_student_infos", connectionProperties)
      
     spark.close()
 
  }
}
root@hadoop01 apache-hive-3.1.3-bin]# bin/hive which: no hbase in (:/export/servers/apache-hive-3.1.3-bin/bin:/export/servers/flume-1.9.0/bin:/export/servers/hadoop-3.3.5/bin::/export/servers/apache-hive-3.1.3-bin/bin:/export/servers/flume-1.9.0/bin::/export/servers/apache-hive-3.1.3-bin/bin:/export/servers/flume-1.9.0/bin:/export/servers/flume-1.9.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/export/servers/jdk1.8.0_161/bin:/export/servers/hadoop-3.3.5/bin:/export/servers/hadoop-3.3.5/sbin:/export/servers/scala-2.12.10/bin:/root/bin:/export/servers/jdk1.8.0_161/bin:/export/servers/hadoop-3.3.5/bin:/export/servers/hadoop-3.3.5/sbin:/export/servers/scala-2.12.10/bin:/export/servers/jdk1.8.0_161/bin:/export/servers/hadoop-3.3.5/bin:/export/servers/hadoop-3.3.5/sbin:/export/servers/scala-2.12.10/bin:/root/bin:/export/servers/jdk1.8.0_161/bin:/export/servers/hadoop-3.3.5/bin:/export/servers/hadoop-3.3.5/sbin:/export/servers/scala-2.12.10/bin:/export/servers/sqoop/bin:) 2025-06-17 18:31:23,734 INFO conf.HiveConf: Found configuration file file:/export/servers/apache-hive-3.1.3-bin/conf/hive-site.xml Hive Session ID = 1bb412dc-6394-489e-80ca-943bacb068f6 2025-06-17 18:31:29,776 INFO SessionState: Hive Session ID = 1bb412dc-6394-489e-80ca-943bacb068f6 Logging initialized using configuration in jar:file:/export/servers/apache-hive-3.1.3-bin/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true 2025-06-17 18:31:29,920 INFO SessionState: Logging initialized using configuration in jar:file:/export/servers/apache-hive-3.1.3-bin/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true 2025-06-17 18:31:33,217 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/1bb412dc-6394-489e-80ca-943bacb068f6 2025-06-17 18:31:33,267 INFO session.SessionState: Created local directory: /tmp/root/1bb412dc-6394-489e-80ca-943bacb068f6 2025-06-17 18:31:33,280 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/1bb412dc-6394-489e-80ca-943bacb068f6/_tmp_space.db 2025-06-17 18:31:33,323 INFO conf.HiveConf: Using the default value passed in for log id: 1bb412dc-6394-489e-80ca-943bacb068f6 2025-06-17 18:31:33,323 INFO session.SessionState: Updating thread name to 1bb412dc-6394-489e-80ca-943bacb068f6 main 2025-06-17 18:31:35,971 INFO metastore.HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore 2025-06-17 18:31:36,007 WARN metastore.ObjectStore: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored 2025-06-17 18:31:36,017 INFO metastore.ObjectStore: ObjectStore, initialize called 2025-06-17 18:31:36,019 INFO conf.MetastoreConf: Found configuration file file:/export/servers/apache-hive-3.1.3-bin/conf/hive-site.xml 2025-06-17 18:31:36,021 INFO conf.MetastoreConf: Unable to find config file hivemetastore-site.xml 2025-06-17 18:31:36,021 INFO conf.MetastoreConf: Found configuration file null 2025-06-17 18:31:36,023 INFO conf.MetastoreConf: Unable to find config file metastore-site.xml 2025-06-17 18:31:36,023 INFO conf.MetastoreConf: Found configuration file null 2025-06-17 18:31:36,357 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 2025-06-17 18:31:36,899 INFO hikari.HikariDataSource: HikariPool-1 - Starting... 2025-06-17 18:31:37,442 INFO hikari.HikariDataSource: HikariPool-1 - Start completed. 2025-06-17 18:31:37,532 INFO hikari.HikariDataSource: HikariPool-2 - Starting... 2025-06-17 18:31:37,674 INFO hikari.HikariDataSource: HikariPool-2 - Start completed. 2025-06-17 18:31:38,661 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 2025-06-17 18:31:39,025 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL 2025-06-17 18:31:39,031 INFO metastore.ObjectStore: Initialized ObjectStore 2025-06-17 18:31:39,705 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 18:31:39,706 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 18:31:39,707 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 18:31:39,707 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 18:31:39,708 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 18:31:39,709 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 18:31:43,370 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 18:31:43,371 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 18:31:43,372 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 18:31:43,372 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 18:31:43,372 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 18:31:43,372 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 18:31:48,380 WARN metastore.ObjectStore: Version information not found in metastore. metastore.schema.verification is not enabled so recording the schema version 3.1.0 2025-06-17 18:31:48,380 WARN metastore.ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 3.1.0, comment = Set by MetaStore root@192.168.245.131 2025-06-17 18:31:48,917 INFO metastore.HiveMetaStore: Added admin role in metastore 2025-06-17 18:31:48,922 INFO metastore.HiveMetaStore: Added public role in metastore 2025-06-17 18:31:49,028 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty 2025-06-17 18:31:49,426 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=root (auth:SIMPLE) retries=1 delay=1 lifetime=0 2025-06-17 18:31:49,471 INFO metastore.HiveMetaStore: 0: get_all_functions 2025-06-17 18:31:49,474 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_all_functions Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. 2025-06-17 18:31:49,645 INFO CliDriver: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Hive Session ID = 4bd9c9da-8eb4-44a2-948e-faa5053d60f3 2025-06-17 18:31:49,645 INFO SessionState: Hive Session ID = 4bd9c9da-8eb4-44a2-948e-faa5053d60f3 2025-06-17 18:31:49,697 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/4bd9c9da-8eb4-44a2-948e-faa5053d60f3 2025-06-17 18:31:49,703 INFO session.SessionState: Created local directory: /tmp/root/4bd9c9da-8eb4-44a2-948e-faa5053d60f3 2025-06-17 18:31:49,710 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/4bd9c9da-8eb4-44a2-948e-faa5053d60f3/_tmp_space.db 2025-06-17 18:31:49,713 INFO metastore.HiveMetaStore: 1: get_databases: @hive# 2025-06-17 18:31:49,714 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_databases: @hive# 2025-06-17 18:31:49,716 INFO metastore.HiveMetaStore: 1: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore 2025-06-17 18:31:49,720 INFO metastore.ObjectStore: ObjectStore, initialize called 2025-06-17 18:31:49,755 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL 2025-06-17 18:31:49,756 INFO metastore.ObjectStore: Initialized ObjectStore 2025-06-17 18:31:49,768 INFO metastore.HiveMetaStore: 1: get_tables_by_type: db=@hive#db_hive1 pat=.*,type=MATERIALIZED_VIEW 2025-06-17 18:31:49,769 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_tables_by_type: db=@hive#db_hive1 pat=.*,type=MATERIALIZED_VIEW 2025-06-17 18:31:49,789 INFO metastore.HiveMetaStore: 1: get_multi_table : db=db_hive1 tbls= 2025-06-17 18:31:49,793 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_multi_table : db=db_hive1 tbls= 2025-06-17 18:31:49,795 INFO metastore.HiveMetaStore: 1: get_tables_by_type: db=@hive#default pat=.*,type=MATERIALIZED_VIEW 2025-06-17 18:31:49,796 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_tables_by_type: db=@hive#default pat=.*,type=MATERIALIZED_VIEW 2025-06-17 18:31:49,803 INFO metastore.HiveMetaStore: 1: get_multi_table : db=default tbls= 2025-06-17 18:31:49,803 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_multi_table : db=default tbls= 2025-06-17 18:31:49,803 INFO metadata.HiveMaterializedViewsRegistry: Materialized views registry has been initialized hive>
最新发布
06-18
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值