本文借鉴于GItHub上博主yangtong123的项目
https://github.com/yangtong123/RoadOfStudySpark
(Spark学习之路)本文对其代码进行了略微修改,仅作交流学习。
问题引入
当下电商盛行,为了更好的获取用户的喜好,消费习惯,不免对用户的消费行为进行分析,由于获取用户数据难度较大,在此我模仿博主yangtong123用Scala模拟了一组用户行为数据集,然后对用户的行为进行分析,测试结果仅供学习。
实验数据集模拟
package com.spark.sql.news
import java.io.{FileOutputStream, OutputStreamWriter, PrintWriter}
import java.text.SimpleDateFormat
import java.util.{Calendar, Date}
import scala.util.Random
object OfflineDataGenerator {
def main(args: Array[String]): Unit = {
val buffer = new StringBuilder("")
val sdf = new SimpleDateFormat("yyyy-MM-dd")
val random = new Random
val sections = Array[String]("Electronic",
"Clothing", "Books", "Home Appliances", "Foods",
"Sports", "Toys", "BeautyProducts", "Furniture", "DigitalMedia")
val actions = Array[String]("view","purchase","add_to_Cart",
"select","add_to_WishList","dislike")
val newOldUserArr = Array[Int](1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
// 生成日期,默认就是昨天
val cal = Calendar.getInstance()