大数据进阶之路——Spark SQL 之 DataFrame&&Dataset_dataframe &&

img
img
img

既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!

由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新

需要这份系统化资料的朋友,可以戳这里获取

peopleDF.filter(peopleDF.col("age") > 19).show()

//根据某一列进行分组,然后再进行聚合操作: select age,count(1) from table group by age
peopleDF.groupBy("age").count().show()

spark.stop()

}

}



root
|-- age: long (nullable = true)
|-- name: string (nullable = true)

±—±------+
| age| name|
±—±------+
|null|Michael|
| 30| Andy|
| 19| Justin|
±—±------+

±------+
| name|
±------+
|Michael|
| Andy|
| Justin|
±------+

±------±—+
| name|age2|
±------±—+
|Michael|null|
| Andy| 40|
| Justin| 29|
±------±—+

±–±—+
|age|name|
±–±—+
| 30|Andy|
±–±—+

±—±----+
| age|count|
±—±----+
| 19| 1|
|null| 1|
| 30| 1|
±—±----+


#### DataFrame和RDD



package org.example

import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
import org.apache.spark.sql.{Row, SparkSession}

object DataFrameRDDApp {
def main(args: Array[String]) {

val spark = SparkSession.builder().appName("DataFrameRDDApp").master("local[2]").getOrCreate()

//inferReflection(spark)

program(spark)

spark.stop()

}

def program(spark: SparkSession): Unit = {
// RDD ==> DataFrame
val rdd = spark.sparkContext.textFile(“infos.txt”)

val infoRDD = rdd.map(_.split(",")).map(line => Row(line(0).toInt, line(1), line(2).toInt))

val structType = StructType(Array(StructField("id", IntegerType, true),
  StructField("name", StringType, true),
  StructField("age", IntegerType, true)))

val infoDF = spark.createDataFrame(infoRDD, structType)
infoDF.printSchema()
infoDF.show()


//通过df的api进行操作
infoDF.filter(infoDF.col("age") > 70).show

//通过sql的方式进行操作
infoDF.createOrReplaceTempView("infos")
spark.sql("select \* from infos where age > 70").show()

}

def inferReflection(spark: SparkSession) {
// RDD ==> DataFrame
val rdd = spark.sparkContext.textFile(“infos.txt”)

//注意:需要导入隐式转换
import spark.implicits._
val infoDF = rdd.map(_.split(",")).map(line => Info(line(0).toInt, line(1), line(2).toInt)).toDF()

infoDF.show()

infoDF.filter(infoDF.col("age") > 70).show

infoDF.createOrReplaceTempView("infos")
spark.sql("select \* from infos where age > 70").show()

}

case class Info(id: Int, name: String, age: Int)

}



root
|-- id: integer (nullable = true)
|-- name: string (nullable = true)
|-- age: integer (nullable = true)

±–±----±–+
| id| name|age|
±–±----±–+
| 1|hello| 66|
| 2|world| 89|
| 3|spark| 88|
±–±----±–+

±–±----±–+
| id| name|age|
±–±----±–+
| 2|world| 89|
| 3|spark| 88|
±–±----±–+

±–±----±–+
| id| name|age|
±–±----±–+
| 2|world| 89|
| 3|spark| 88|
±–±----±–+


#### 案例



package org.example

import org.apache.spark.sql.SparkSession

/**
* DataFrame中的操作操作
*/
object DataFrameProjectApp {

def main(args: Array[String]) {
val spark = SparkSession.builder().appName(“DataFrameProjectApp”).master(“local[2]”).getOrCreate()

// RDD ==> DataFrame
val rdd = spark.sparkContext.textFile("Student.data")

//注意:需要导入隐式转换
import spark.implicits._
val studentDF = rdd.map(_.split("\\|")).map(line => Student(line(0).toInt, line(1), line(2), line(3))).toDF()

//show默认只显示前20条
studentDF.show
studentDF.show(30)
studentDF.show(30, false)

studentDF.take(10)
studentDF.first()
studentDF.head(3)


studentDF.select("email").show(30,false)


studentDF.filter("name=''").show
studentDF.filter("name='' OR name='NULL'").show


//name以M开头的人
studentDF.filter("SUBSTR(name,0,1)='M'").show

studentDF.sort(studentDF("name")).show
studentDF.sort(studentDF("name").desc).show

studentDF.sort("name","id").show
studentDF.sort(studentDF("name").asc, studentDF("id").desc).show

studentDF.select(studentDF("name").as("student\_name")).show


val studentDF2 = rdd.map(_.split("\\|")).map(line => Student(line(0).toInt, line(1), line(2), line(3))).toDF()

studentDF.join(studentDF2, studentDF.col("id") === studentDF2.col("id")).show

spark.stop()

}

case class Student(id: Int, name: String, phone: String, email: String)

}



±–±-------±-------------±-------------------+
| id| name| phone| email|
±–±-------±-------------±-------------------+
| 1| Burke|1-300-746-8446|ullamcorper.velit…|
| 2| Kamal|1-668-571-5046|pede.Suspendisse@…|
| 3| Olga|1-956-311-1686|Aenean.eget.metus…|
| 4| Belle|1-246-894-6340|vitae.aliquet.nec…|
| 5| Trevor|1-300-527-4967|dapibus.id@acturp…|
| 6| Laurel|1-691-379-9921|adipiscing@consec…|
| 7| Sara|1-608-140-1995|Donec.nibh@enimEt…|
| 8| Kaseem|1-881-586-2689|cursus.et.magna@e…|
| 9| Lev|1-916-367-5608|Vivamus.nisi@ipsu…|
| 10| Maya|1-271-683-2698|accumsan.convalli…|
| 11| Emi|1-467-270-1337| est@nunc.com|
| 12| Caleb|1-683-212-0896|Suspendisse@Quisq…|
| 13|Florence|1-603-575-2444|sit.amet.dapibus@…|
| 14| Anika|1-856-828-7883|euismod@ligulaeli…|
| 15| Tarik|1-398-171-2268|turpis@felisorci.com|
| 16| Amena|1-878-250-3129|lorem.luctus.ut@s…|
| 17| Blossom|1-154-406-9596|Nunc.commodo.auct…|
| 18| Guy|1-869-521-3230|senectus.et.netus…|
| 19| Malachi|1-608-637-2772|Proin.mi.Aliquam@…|
| 20| Edward|1-711-710-6552|lectus@aliquetlib…|
±–±-------±-------------±-------------------+
only showing top 20 rows

| id| name| phone| email|
±–±-------±-------------±-------------------+
| 1| Burke|1-300-746-8446|ullamcorper.velit…|
| 2| Kamal|1-668-571-5046|pede.Suspendisse@…|
| 3| Olga|1-956-311-1686|Aenean.eget.metus…|
| 4| Belle|1-246-894-6340|vitae.aliquet.nec…|
| 5| Trevor|1-300-527-4967|dapibus.id@acturp…|
| 6| Laurel|1-691-379-9921|adipiscing@consec…|
| 7| Sara|1-608-140-1995|Donec.nibh@enimEt…|
| 8| Kaseem|1-881-586-2689|cursus.et.magna@e…|
| 9| Lev|1-916-367-5608|Vivamus.nisi@ipsu…|
| 10| Maya|1-271-683-2698|accumsan.convalli…|
| 11| Emi|1-467-270-1337| est@nunc.com|
| 12| Caleb|1-683-212-0896|Suspendisse@Quisq…|
| 13|Florence|1-603-575-2444|sit.amet.dapibus@…|
| 14| Anika|1-856-828-7883|euismod@ligulaeli…|
| 15| Tarik|1-398-171-2268|turpis@felisorci.com|
| 16| Amena|1-878-250-3129|lorem.luctus.ut@s…|
| 17| Blossom|1-154-406-9596|Nunc.commodo.auct…|
| 18| Guy|1-869-521-3230|senectus.et.netus…|
| 19| Malachi|1-608-637-2772|Proin.mi.Aliquam@…|
| 20| Edward|1-711-710-6552|lectus@aliquetlib…|
| 21| |1-711-710-6552|lectus@aliquetlib…|
| 22| |1-711-710-6552|lectus@aliquetlib…|
| 23| NULL|1-711-710-6552|lectus@aliquetlib…|
±–±-------±-------------±-------------------+

±–±-------±-------------±----------------------------------------+
|id |name |phone |email |
±–±-------±-------------±----------------------------------------+
|1 |Burke |1-300-746-8446|ullamcorper.velit.in@ametnullaDonec.co.uk|
|2 |Kamal |1-668-571-5046|pede.Suspendisse@interdumenim.edu |
|3 |Olga |1-956-311-1686|Aenean.eget.metus@dictumcursusNunc.edu |
|4 |Belle |1-246-894-6340|vitae.aliquet.nec@neque.co.uk |
|5 |Trevor |1-300-527-4967|dapibus.id@acturpisegestas.net |
|6 |Laurel |1-691-379-9921|adipiscing@consectetueripsum.edu |
|7 |Sara |1-608-140-1995|Donec.nibh@enimEtiamimperdiet.edu |
|8 |Kaseem |1-881-586-2689|cursus.et.magna@euismod.org |
|9 |Lev |1-916-367-5608|Vivamus.nisi@ipsumdolor.com |
|10 |Maya |1-271-683-2698|accumsan.convallis@ornarelectusjusto.edu |
|11 |Emi |1-467-270-1337|est@nunc.com |
|12 |Caleb |1-683-212-0896|Suspendisse@Quisque.edu |
|13 |Florence|1-603-575-2444|sit.amet.dapibus@lacusAliquamrutrum.ca |
|14 |Anika |1-856-828-7883|euismod@ligulaelit.co.uk |
|15 |Tarik |1-398-171-2268|turpis@felisorci.com |
|16 |Amena |1-878-250-3129|lorem.luctus.ut@scelerisque.com |
|17 |Blossom |1-154-406-9596|Nunc.commodo.auctor@eratSed.co.uk |
|18 |Guy |1-869-521-3230|senectus.et.netus@lectusrutrum.com |
|19 |Malachi |1-608-637-2772|Proin.mi.Aliquam@estarcu.net |
|20 |Edward |1-711-710-6552|lectus@aliquetlibero.co.uk |
|21 | |1-711-710-6552|lectus@aliquetlibero.co.uk |
|22 | |1-711-710-6552|lectus@aliquetlibero.co.uk |
|23 |NULL |1-711-710-6552|lectus@aliquetlibero.co.uk |
±–±-------±-------------±----------------------------------------+

±----------------------------------------+
|email |
±----------------------------------------+
|ullamcorper.velit.in@ametnullaDonec.co.uk|
|pede.Suspendisse@interdumenim.edu |
|Aenean.eget.metus@dictumcursusNunc.edu |
|vitae.aliquet.nec@neque.co.uk |
|dapibus.id@acturpisegestas.net |
|adipiscing@consectetueripsum.edu |
|Donec.nibh@enimEtiamimperdiet.edu |
|cursus.et.magna@euismod.org |
|Vivamus.nisi@ipsumdolor.com |
|accumsan.convallis@ornarelectusjusto.edu |
|est@nunc.com |
|Suspendisse@Quisque.edu |
|sit.amet.dapibus@lacusAliquamrutrum.ca |
|euismod@ligulaelit.co.uk |
|turpis@felisorci.com |
|lorem.luctus.ut@scelerisque.com |
|Nunc.commodo.auctor@eratSed.co.uk |
|senectus.et.netus@lectusrutrum.com |
|Proin.mi.Aliquam@estarcu.net |
|lectus@aliquetlibero.co.uk |
|lectus@aliquetlibero.co.uk |
|lectus@aliquetlibero.co.uk |
|lectus@aliquetlibero.co.uk |
±----------------------------------------+

±–±—±-------------±-------------------+
| id|name| phone| email|
±–±—±-------------±-------------------+
| 21| |1-711-710-6552|lectus@aliquetlib…|
| 22| |1-711-710-6552|lectus@aliquetlib…|
±–±—±-------------±-------------------+

±–±—±-------------±-------------------+
| id|name| phone| email|
±–±—±-------------±-------------------+
| 21| |1-711-710-6552|lectus@aliquetlib…|
| 22| |1-711-710-6552|lectus@aliquetlib…|
| 23|NULL|1-711-710-6552|lectus@aliquetlib…|
±–±—±-------------±-------------------+

±–±------±-------------±-------------------+
| id| name| phone| email|
±–±------±-------------±-------------------+
| 10| Maya|1-271-683-2698|accumsan.convalli…|
| 19|Malachi|1-608-637-2772|Proin.mi.Aliquam@…|
±–±------±-------------±-------------------+

±–±-------±-------------±-------------------+
| id| name| phone| email|
±–±-------±-------------±-------------------+
| 21| |1-711-710-6552|lectus@aliquetlib…|
| 22| |1-711-710-6552|lectus@aliquetlib…|
| 16| Amena|1-878-250-3129|lorem.luctus.ut@s…|

img
img
img

既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!

由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新

需要这份系统化资料的朋友,可以戳这里获取

|
| 16| Amena|1-878-250-3129|lorem.luctus.ut@s…|

[外链图片转存中…(img-Re1ZPU5h-1715455077174)]
[外链图片转存中…(img-1u23ASOd-1715455077174)]
[外链图片转存中…(img-zXq27Ihd-1715455077175)]

既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!

由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新

需要这份系统化资料的朋友,可以戳这里获取

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值