之前对scorecardpy 库woebin函数进行了一些介绍,对于其中的参数,现在挨个进行讲解。
先看一下底层函数的参数说明(比较长可以直接跳过):
def woebin(dt, y, x=None,
var_skip=None, breaks_list=None, special_values=None,
stop_limit=0.1, count_distr_limit=0.05, bin_num_limit=8,
# min_perc_fine_bin=0.02, min_perc_coarse_bin=0.05, max_num_bin=8,
positive="bad|1", no_cores=None, print_step=0, method="tree",
ignore_const_cols=True, ignore_datetime_cols=True,
check_cate_num=True, replace_blank=True,
save_breaks_list=None, **kwargs):
'''
WOE Binning
------
`woebin` generates optimal binning for numerical, factor and categorical
variables using methods including tree-like segmentation or chi-square
merge. woebin can also customizing breakpoints if the breaks_list or
special_values was provided.
The default woe is defined as ln(Distr_Bad_i/Distr_Good_i). If you
prefer ln(Distr_Good_i/Distr_Bad_i), please set the argument `positive`
as negative value, such as '0' or 'good'. If there is a zero frequency
class when calculating woe, the zero will replaced by 0.99 to make the
woe calculable.
Params
------
dt: A data frame with both x (predictor/feature) and y (response/label) variables.
y: Name of y variable.
x: Name of x variables. Default is None. If x is None,
then all variables except y are counted as x variables.
var_skip: Name of variables that will skip for binning. Defaults to None.
breaks_list: List of break points, default is None.
If it is not None, variable binning will based on the
provided breaks.
special_values: the values specified in special_values
will be in separate bins. Default is None.
count_distr_limit: The minimum percentage of final binning

最低0.47元/天 解锁文章
4297

被折叠的 条评论
为什么被折叠?



