pyspark-Frequent Pattern Mining

最新推荐文章于 2025-07-14 03:02:32 发布

翻译最新推荐文章于 2025-07-14 03:02:32 发布 · 810 阅读

文章标签：

#spark

spark 专栏收录该内容

27 篇文章

订阅专栏

本文通过一个简单的示例展示了如何使用Apache Spark中的FPGrowth算法进行频繁项集挖掘及关联规则生成。示例中创建了一个包含商品交易记录的数据集，并应用FPGrowth算法寻找频繁出现的商品组合及其置信度。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

参考地址：

1、http://spark.apache.org/docs/latest/ml-guide.html

2、https://github.com/apache/spark/tree/v2.2.0

3、http://spark.apache.org/docs/latest/ml-frequent-pattern-mining.html

from pyspark.ml.fpm import FPGrowth

df = spark.createDataFrame([
    (0, [1, 2, 5]),
    (1, [1, 2, 3, 5]),
    (2, [1, 2])
], ["id", "items"])

fpGrowth = FPGrowth(itemsCol="items", minSupport=0.5, minConfidence=0.6)
model = fpGrowth.fit(df)

# Display frequent itemsets.
model.freqItemsets.show()

# Display generated association rules.
model.associationRules.show()

# transform examines the input items against all the association rules and summarize the
# consequents as prediction
model.transform(df).show()