（Python积累）pandas中cut的参数

最新推荐文章于 2024-12-04 16:34:19 发布

weixin_46790209

最新推荐文章于 2024-12-04 16:34:19 发布

阅读量777

点赞数

分类专栏： Python练习文章标签： python list tornado

本文链接：https://blog.youkuaiyun.com/weixin_46790209/article/details/123140678

版权

Python练习专栏收录该内容

3 篇文章

订阅专栏

该博客详细介绍了Python中用于数据离散化的`cut`函数，它能够将数值型数据分段并转换为类别。参数包括输入数组`x`、分组数`bins`、是否包含右边界`right`、自定义标签`labels`等。通过设置不同参数，可以实现等宽或自定义宽度的分组，并控制是否包含最低值。此外，还讨论了如何处理重复的分组边界和返回分组边界的选择。该函数在数据分析和特征工程中非常实用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

def cut(
    x,
    bins,
    right: bool = True,
    labels=None,
    retbins: bool = False,
    precision: int = 3,
    include_lowest: bool = False,
    duplicates: str = "raise",
    ordered: bool = True,
):
    """
    Bin values into discrete intervals.

    Use `cut` when you need to segment and sort data values into bins. This
    function is also useful for going from a continuous variable to a
    categorical variable. For example, `cut` could convert ages to groups of
    age ranges. Supports binning into an equal number of bins, or a
    pre-specified array of bins.

    Parameters
    ----------
    x : array-like
        The input array to be binned. Must be 1-dimensional.
    bins : int, sequence of scalars, or IntervalIndex
        The criteria to bin by.

        * int : Defines the number of equal-width bins in the range of `x`. The
          range of `x` is extended by .1% on each side to include the minimum
          and maximum values of `x`.
        * sequence of scalars : Defines the bin edges allowing for non-uniform
          width. No extension of the range of `x` is done.
        * IntervalIndex : Defines the exact bins to be used. Note that
          IntervalIndex for `bins` must be non-overlapping.

    right : bool, default True
        Indicates whether `bins` includes the rightmost edge or not. If
        ``right == True`` (the default), then the `bins` ``[1, 2, 3, 4]``
        indicate (1,2], (2,3], (3,4]. This argument is ignored when
        `bins` is an IntervalIndex.
    labels : array or False, default None
        Specifies the labels for the returned bins. Must be the same length as
        the resulting bins. If False, returns only integer indicators of the
        bins. This affects the type of the output container (see below).
        This argument is ignored when `bins` is an IntervalIndex. If True,
        raises an error. When `ordered=False`, labels must be provided.
    retbins : bool, default False
        Whether to return the bins or not. Useful when bins is provided
        as a scalar.
    precision : int, default 3
        The precision at which to store and display the bins labels.
    include_lowest : bool, default False
        Whether the first interval should be left-inclusive or not.
    duplicates : {default 'raise', 'drop'}, optional
        If bin edges are not unique, raise ValueError or drop non-uniques.
    ordered : bool, default True
        Whether the labels are ordered or not. Applies to returned types
        Categorical and Series (with Categorical dtype). If True,
        the resulting categorical will be ordered. If False, the resulting
        categorical will be unordered (labels must be provided).

主要功能：将x数组离散化成bins个分组

def cut( x, bins, right: bool = True, labels=None, retbins: bool = False, precision: int = 3, include_lowest: bool = False, duplicates: str = "raise", ordered: bool = True,):

x：要操作的数组对象

bins：要分成的组块的数目

right：bool=True：分组时右侧是否闭合，是否包含右边边界的值

labels=None：分组情况的标签。例：labels=range(0,4)

precision：小数点的位数，默认为3

include_lowest：是否包括最低值，即左侧是否闭合

duplicates：是否允许重复区间，raise可重复，drop不允许重复

ordered：是否为有序的，默认是TRUE即为有序