一、从SQL到RDD
1. 一个简单的例子
样例数据 test.json
{
"name":"上海滩","singer":"叶丽仪","album":"香港电视剧主题歌","path":"mp3/shanghaitan.mp3"},
{
"name":"一生何求","singer":"陈百强","album":"香港电视剧主题歌","path":"mp3/shanghaitan.mp3"},
{
"name":"红日","singer":"李克勤","album":"怀旧专辑","path":"mp3/shanghaitan.mp3"},
{
"name":"爱如潮水","singer":"张信哲","album":"怀旧专辑","path":"mp3/airucaoshun.mp3"},
{
"name":"红茶馆","singer":"陈惠嫻","album":"怀旧专辑","path":"mp3/redteabar.mp3"}
SparkSQL读进来,得到DataFrame
scala> spark.read.json("test.json")
res1: org.apache.spark.sql.DataFrame = [album: string, name: string ... 2 more fields]
scala> res1.show(false)
+--------+----+-------------------+------+
|album |name|path |singer|
+--------+----+-------------------+------+
|香港电视剧主题歌|上海滩 |mp3/shanghaitan.mp3|叶丽仪 |
|香港电视剧主题歌|一生何求|mp3/shanghaitan.mp3|陈百强 |
|怀旧专辑 |红日 |mp3/shanghaitan.mp3|李克勤 |
|怀旧专辑 |爱如潮水|mp3/airucaoshun.mp3|张信哲 |
|怀旧专辑 |红茶馆 |mp3/redteabar.mp3 |陈惠嫻 |
+--------+----+-------------------+------+
执行个简单的查询
scala> res1.createOrReplaceTempView("test")
scala> spark.sql(