图解Spark商品关联分析

场景

购物蓝分析(market basket analysis):
在一个数据集中找出项之间的关系
例如,购买手机的顾客,有10%也买手机壳

关联算法

SparkSQL粗分析

大致原理

import org.apache.spark.sql.SparkSession
import org.apache.spark.{
   
   SparkConf, SparkContext}
//创建SparkContext对象
val c0: SparkConf = new SparkConf().setAppName("a0").setMaster("local")
val sc: SparkContext = new SparkContext(c0)
//创建SparkSession对象
val c1: SparkConf = new SparkConf().setAppName("a1").setMaster("local")
val spark: SparkSession = SparkSession.builder().config(c1).getOrCreate()
//隐式转换支持
import spark.implicits._
//原始子订单表
sc.makeRDD(Seq(
  ("u1", "o1", "g1"),
  ("u1", "o1", "g2"),
  ("u2", "o2", "g1"),
  ("u1", "o3", "g1"),
  ("u1", "o3", "g2"),
  ("u1", "o3", "g3"),
  ("u3", "o4", "g3"),
  ("u4", "o5", "g1"),
  ("u5", "o6", "g4"),
)).toDF("user_id", "order_id", "good_id").createTempView("t0")
//按订单关联
spark.sql(
  """
    |SELECT COLLECT_SET(good_id)good_set FROM t0
    |GROUP BY user_id
    |HAVING SIZE(good_set) > 1
    |""".stripMargin).createTempView("t1")
spark.sql(
  """
    |SELECT good_set,count(good_set)c FROM t1
    |GROUP BY good_set
    |ORDER BY c DESC
    |""".stripMargin).show

SQL计算结果略为粗糙

共现频数

大致原理

import org.apache.spark.sql.SparkSession
import org.apache.spark.{
   
   SparkConf, SparkContext}
//创建SparkContext对象
val c0: SparkConf = new SparkConf().setAppName("a0").setMaster("local")
val sc: SparkContext = new SparkContext(c0)
//创建SparkSession对象
val c1: SparkConf = new SparkConf().setAppName("a1").setMaster("local")
val spark: SparkSession = SparkSession.builder().config(c1).getOrCreate()
//隐式转换支持
import spark.implicits._
//子订单表(模拟HIVE:SELECT order_id,good_id FROM dwd_order_detail)
sc.makeRDD(Seq(
  ("o1", "g1"),
  ("o1", "g2"),
  ("o2", "g1"),
  ("o3", "g1"),
  ("o3", "g2"),
  ("o3", "g3"),
  ("o4", "g3"),
  ("o5", "g1"),
  ("o6", "g4"),
)).toDF("order_id", "good_id").createTempView("dwd_order_detail")
//按订单关联商品
val df = spark.sql(
  """
    |SELECT COLLECT_SET(good_id) FROM dwd_order_detail
    |GROUP BY order_id
    |""".stripMargin).toDF("items") //下面模型默认输入列名叫items
df.show
/*
+------------+
|       items|
+------------+
|        [g3]|
|        [g4]|
|    [g2, g1]|
|[g2, g3, g1]|
|        [g1]|
|        [g1]|
+------------+
*/
//共现频数模型
import org.apache.spark.ml.fpm.FPGrowth
val fpGrowth = new FPGrowth().setMinSupport(0).setMinConfidence(0)
val model = fpGrowth.fit(df)
println(model)
/*
FPGrowthModel: uid=fpgrowth_352773cafc93, numTrainingRecords=6
*/
//商品共现频数
model.freqItemsets.show
/*
+------------+----+
|       items|freq|
+------------+----+
|        [g1]|   4|
|        [g2]|   2|
|    [g2, g1]|   2|
|        [g3]|   2|
|    [g3, g2]|   1|
|[g3, g2, g1]|   1|
|    [g3, g1]|   1|
|        [g4]|   1|
+------------+----+
*/
//商品关联规则
model.associationRules.show
/*
+----------+----------+----------+----+
|antecedent|consequent|confidence|lift|
+----------+----------+----------+----+
|  [g3, g1]|      [g2]|       1.0| 3.0|
|      [g2]|      [g1]|       1.0| 1.5|
|      [g2]|      [g3]|       0.5| 1.5|
|  [g2, g1]|      [g3]|       0.5| 1.5|
|  [g3, g2]|      [g1]|       1.0| 1.5|
|      [g1]|      [g2]|       0.5| 1.5|
|      [g1]|      [g3]|      0.25|0.75|
|      [g3]|      [g2]|       0.5| 1.5|
|      [g3]|      [g1]|       0.5|0.75|
+----------+----------+----------+----+
*/
//预测
model.transform(Seq(
  "g1",
  "g2",
  "g3",
  "g4",
  "g1 g2",
  "g1 g3",
  "g2 g3",
).map(_.split(" ")).toDF("items")).show
/*
+--------+----------+
|   items|prediction|
+-
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

小基基o_O

您的鼓励是我创作的巨大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值