Spark创建DataFrame的三种方法:
一、通过读取文件创建DataFrame:
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("local[1]").appName("mytest").getOrCreate()
//header表示存在并使用表头 //加载本地 若为hdfs则改为hdfs://格式
val course = spark.read.format("csv").option("header","true").load("file:///f:/course.csv")
course.show()
//spark 提供了大量的算子可以对内存中的表进行操作
//或者可以使用spark.sql直接用sql语句进行操作
//course.createOrReplaceTempView("course")
// spark.sql("select * from course").show()
//文件为csv格式
val mjson = `spark.read.format("json").load("file:///f:/k.json")
mjson.show()
}
+----+------+----+
|c_id|c_name|t_id|
+----+------+----+
| 01| 语文| 02|
| 02| 数学| 01|
| 03| 英语| 03|
+----+------+----+
二、通过Seq创建DataFrame:
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("local[1]").appName("mytest").getOrCreate()
val course = spark.createDataFrame(Seq(
("01","语文", "02"),
("02","数学", "01"),
("03","英语", "03")
)) toDF("c_id", "c_name","t_id")
course.show()
}
+----+------+----+
|c_id|c_name|t_id|
+----+------+----+
| 01| 语文| 02|
| 02| 数学| 01|
| 03| 英语| 03|
+----+------+----+
三、通过创建schema创建DataFrame:
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("local[1]").appName("mytest").getOrCreate()
val schema = StrcutType(
List(
StructField("c_id",StringType,true),
StructFiled("c_name",StringType,true),
StructFiled("t_id",StringType,true)
)
)
val course = spark.createDataFrame(spark.sparkContext.parallelize(Seq(
Row("1","语文","2"),
Row("2","数学","1"),
Row("3","英语","3")
)),schema)
course.show()
}
+----+------+----+
|c_id|c_name|t_id|
+----+------+----+
| 1| 语文| 2|
| 2| 数学| 1|
| 3| 英语| 3|
+----+------+----+
四、通过读取数据库
mysql数据库中存表入下
mysql> use mydemo;
Database changed
mysql> select * from course;
+------+--------+------+
| c_id | c_name | t_id |
+------+--------+------+
| 01 | 语文 | 02 |
| 02 | 数学 | 01 |
| 03 | 英语 | 03 |
+------+--------+------+
3 rows in set (0.00 sec)
Spark代码
注:需要导入mysql连接驱动jar包
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("local[1]").appName("mytest").getOrCreate()
val options = new util.HashMap[String,String]()
options.put("url", "jdbc:mysql://192.168.56.21:3306/mydemo")
options.put("driver","com.mysql.jdbc.Driver")
options.put("user","root")
options.put("password","ok")
options.put("dbtable","course")
val course = spark.read.format("jdbc").options(options).load()
course.show()
}
+----+------+----+
|c_id|c_name|t_id|
+----+------+----+
| 01| 语文| 02|
| 02| 数学| 01|
| 03| 英语| 03|
+----+------+----+
参考自:https://blog.youkuaiyun.com/weixin_42487460/article/details/107646613