2017-02-20

http://jingpin.jikexueyuan.com/article/22803.html

http://blog.youkuaiyun.com/ustcxjt/article/details/7313557

http://www.cnblogs.com/liuxianan/archive/2013/01/15/2861196.html

https://zhidao.baidu.com/question/54711775.html

 

 

http://blog.youkuaiyun.com/guojing505123/article/details/53579973

http://blog.youkuaiyun.com/convict_eva/article/details/52883041

http://www.th7.cn/Program/java/201604/845924.shtml

http://blog.youkuaiyun.com/zzq900503/article/details/54709801

http://blog.youkuaiyun.com/BuquTianya/article/category/2492477

http://blog.youkuaiyun.com/matthewei6/article/details/50709212

转载于:https://www.cnblogs.com/royi123/p/6419714.html

可以使用SparkSQL中的窗口函数来实现这个需求。具体的实现步骤如下: 1. 创建SparkSession对象 ``` from pyspark.sql import SparkSession spark = SparkSession.builder.appName("user_visit_count").getOrCreate() ``` 2. 加载数据并创建Spark DataFrame ``` data = [("u01", "2017/1/21", 5), ("u02", "2017/1/23", 6), ("u03", "2017/1/22", 8), ("u04", "2017/1/20", 3), ("u01", "2017/1/23", 6), ("u01", "2017/2/21", 8), ("u02", "2017/1/23", 6), ("u01", "2017/2/22", 4)] df = spark.createDataFrame(data, ["userID", "visitDate", "visitCount"]) ``` 3. 对数据进行处理,将visitDate字段转换为月份 ``` from pyspark.sql.functions import month, year, date_format df = df.withColumn("month", date_format(df["visitDate"], "yyyy-MM")) ``` 4. 使用窗口函数计算累计访问数 ``` from pyspark.sql.window import Window from pyspark.sql.functions import sum windowSpec = Window.partitionBy("userID").orderBy("month") df = df.withColumn("cumulativeCount", sum("visitCount").over(windowSpec)) ``` 5. 对结果进行排序并显示 ``` df = df.sort("userID", "month") df.show() ``` 最终的输出结果为: ``` +------+----------+----------+-------+---------------+ |userID| visitDate|visitCount| month|cumulativeCount| +------+----------+----------+-------+---------------+ | u01|2017/1/21| 5|2017-01| 5| | u01|2017/1/23| 6|2017-01| 11| | u01|2017/2/21| 8|2017-02| 19| | u01|2017/2/22| 4|2017-02| 23| | u02|2017/1/23| 6|2017-01| 6| | u02|2017/1/23| 6|2017-01| 12| | u03|2017/1/22| 8|2017-01| 8| | u04|2017/1/20| 3|2017-01| 3| +------+----------+----------+-------+---------------+ ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值