代码
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]
num_pipeline = Pipeline([
('selector', DataFrameSelector(num_attribs)),
('imputer', Imputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
])
cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attribs)),
('label_binarizer', MyLabelBinarizer()),
])
from sklearn.pipeline import FeatureUnion
full_pipeline = FeatureUnion(transformer_list=[
("num_pipeline", num_pipeline),
("cat_pipeline", cat_pipeline),
])
housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared
报错
使用sklearn的Pipeline报错:
Error message: fit_transform() takes 2 positional arguments but 3 were given
检错
错误来源是pipeline调用LabelBinarizer的fit_transform方法时发现有三个参数
def fit_transform(self, x, y)
...rest of the code
而实际上LabelBinarizer的fit_transform()方法只定义了两个参数
def fit_transform(self, x):
...rest of the code
解决方法
自己包装一个可以传入三个参数的自定义的LabelBinarizer类
from sklearn.base import TransformerMixin #gives fit_transform method for free
class MyLabelBinarizer(TransformerMixin):
def __init__(self, *args, **kwargs):
self.encoder = LabelBinarizer(*args, **kwargs)
def fit(self, x, y=0):
self.encoder.fit(x)
return self
def transform(self, x, y=0):
return self.encoder.transform(x)
Keep your code the same only instead of using LabelBinarizer(), use the class we created : MyLabelBinarizer().
将代码中的LabelBinarizer类改为自定义的MyLabelBinarizer类。