开窗函数:
row_number()开窗函数
主要是按照某个字段分组,然后取定义字段的前几个值,相当于分割取topN的问题
row_number() over(partition by xxx order by xxxx desc) xxx
注意:
如果SQL语句里面使用了开窗函数,那么这个SQL语句必须使用HIveContext来执行,HIveContext是SQLcontext的子类
hivecontext默认情况下在本地无法创建
相当于分组取topN
先分组,然后按照降序(升序)排序,再查询
row_number() over(partition by xxx order by xxxx desc) xxx:这个可能是一个特殊格式
"row_number() over (partition by leibie order by jine desc) rank "
"select riqi,leibie,jine"
+"from ("
+"select riqi,leibie,jine,"
+"row_number() over (partition by leibie order by jine desc) rank "
+"from sales) t"
+"where t.rank<=3"
package com.bjsxt;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.hive.HiveContext;
/**
* row_number()开窗函数
* 主要是按照某个字段分组,然后取令义字段的前几个值,相当于分割取topN的问题
* row_number() over(partition by xxx order by xxxx desc) xxx
* 注意:
* 如果SQL语句里面使用了开窗函数,那么这个SQL语句必须使用HIveContext来执行
* hivecontext默认情况下在本地无法创建
* @author Administrator
*
*/
public class Window {
public static void main(String[] args) {
SparkConf conf=new SparkConf();
conf.setAppName("user");
conf.set("spark.sql.shuffle.partitions","1");
JavaSparkContext sc=new JavaSparkContext(conf);
HiveContext hiveContext=new HiveContext(sc);
hiveContext.sql("user spark");
hiveContext.sql("drop table if exists sales");
hiveContext.sql(
"create table if not exists sales (riqi String,leibie String,jine Int)"
+"row format delimited fields terminated by '\t'"
);
hiveContext.sql("load data local inpath '/root/test/sales' into table sales");
/**
* 开窗函数格式:
* 【row_number() over (partition by XXX order by XXX DESC) as rank]
* 注意:rank 从1开始
*/
/**
* 以类别分组
* 按照每种类别金额降序排序
* 显示【日期,种类,金额】
* 结果,如:
*
*/
DataFrame result = hiveContext.sql("select riqi,leibie,jine"
+"from ("
+"select riqi,leibie,jine,"
+"row_number() over (partition by leibie order by jine desc) rank "
+"from sales) t"
+"where t.rank<=3"
);
result.show();
/**
* 将结果保存在hive表的sales_result中
*/
result.write().mode(SaveMode.Overwrite).saveAsTable("sales_result");
sc.stop();
}
}
Scala版:
val conf = new SparkConf()
conf.setAppName("windowfun")
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc)
hiveContext.sql("use spark");
hiveContext.sql("drop table if exists sales");
hiveContext.sql("create table if not exists sales (riqi string,leibie string,jine Int) "
+ "row format delimited fields terminated by '\t'");
hiveContext.sql("load data local inpath '/root/test/sales' into table sales");
/**
* 开窗函数格式:
* 【 rou_number() over (partitin by XXX order by XXX) 】
*/
val result = hiveContext.sql("select riqi,leibie,jine "
+ "from ("
+ "select riqi,leibie,jine,"
+ "row_number() over (partition by leibie order by jine desc) rank "
+ "from sales) t "
+ "where t.rank<=3");
result.show();
sc.stop()
谢谢您的支持,我会倍加努力,写出更好的博文给您看的。