文章大纲
接上文:spark 特征工程 – 分箱 Binning(如何实现等频、等宽分箱)
简介 – 什么是 spark Estimator
官方文档有一篇有点年代了:
https://spark.apache.org/docs/latest/ml-pipeline.html
先来看看以下几个核心概念:
PipeLine:工作流。
A simple pipeline, which acts as an estimator. A Pipeline consists of a sequence of stages, each of which is either an Estimator or a Transformer. When Pipeline.fit is called, the stages are executed in order. If a stage is an Estimator, its Estimator.fit method will be called on the input dataset to fit a model. Then the model, which is a transformer, will be used to transform the dataset as the input to the nex