| 85|
| 80|
| 75|
±–+
only showing top 5 rows
±------±-----------------+
|summary| id|
±------±-----------------+
| count| 19|
| mean| 50.0|
| stddev|28.136571693556885|
| min| 5|
| max| 95|
±------±-----------------+
2、集合创建 DataSet
首先创建几个可能用到的样例类:
//样例类
case class Person(name: String, age: Int, height: Int)
case class People(age: Int, names: String)
case class Score(name: String, grade: Int)
然后定义隐式转换:
import spark.implicits._
最后,定义集合,创建 DataSet
//2、集合转成dataset
val seq1 = Seq(Person(“xzw”, 24, 183), Person(“yxy”, 24, 178), Person(“lzq”, 25, 168))
val ds1 = spark.createDataset(seq1)
ds1.show()
结果如下所示:
±—±–±-----+
|name|age|height|
±—±–±-----+
| xzw| 24| 183|
| yxy| 24| 178|
| lzq| 25| 168|
±—±–±-----+
3、RDD 转成 DataFrame
//3、RDD转成DataFrame
val array1 = Array((33, 24, 183), (33, 24, 178), (33, 25, 168))
val rdd1 = spark.sparkContext.parallelize(array1, 3).map(f => Row(f._1, f._2, f._3))
val schema = StructType(
StructField(“a”, IntegerType, false) ::
StructField(“b”, IntegerType, true) :: Nil
)
val rddToDataFrame = spark.createDataFrame(rdd1, schema)
rddToDataFrame.show(false)
结果如下所示:
±–±–+
|a |b |
±–±–+
|33 |24 |
|33 |24 |
|33 |25 |
±–±–+
4、读取文件
//4、读取文件,这里以csv文件为例
val ds2 = spark.read.csv(“C://Users//Machenike//Desktop//xzw//test.csv”)
ds2.show()
结果如下所示:
±–±–±—+
|_c0|_c1| _c2|
±–±–±—+
|xzw| 24| 183|
|yxy| 24| 178|
|lzq| 25| 168|
±–±–±—+
5、读取文件,并配置详细参数
//5、读取文件,并配置详细参数
val ds3 = spark.read.options(Map((“delimiter”, “,”), (“header”, “false”)))
.csv(“C://Users//Machenike//Desktop//xzw//test.csv”)
ds3.show()
结果如下图所示:
±–±–±—+
|_c0|_c1| _c2|
±–±–±—+
|xzw| 24| 183|
|yx