Cris 的 Python 数据分析笔记 05：Pandas 数据读取，索引，切片，计算，列整合，过滤，最值

最新推荐文章于 2023-11-29 14:44:19 发布

cris_zz

最新推荐文章于 2023-11-29 14:44:19 发布

阅读量1.1k

点赞数

CC 4.0 BY-SA版权

分类专栏： Python Pandas 文章标签： Python Pandas Numpy 数据分析基础语法

本文链接：https://blog.youkuaiyun.com/cris_zz/article/details/84333615

Pandas 数据读取，索引，切片，计算，列整合，过滤，最值

文章目录

- - Pandas 数据读取，索引，切片，计算，列整合，过滤，最值

1. read_csv 函数

import pandas as pd

'''
    xxx.csv 文件就是以 ， 分割的二维数据
    在 Pandas 中，核心数据结构就是 DataFrame，类似于 NumPy 的 Ndaaray（矩阵）
    DataFrame 数据的 dtypes 属性可以显示 .csv 文件每一列数据的数据类型，在 Pandas 中，整数就是 int64 类型；
    小数就是 float64 类型；字符串就是 object 类型
    read_csv 函数很重要哦！！！
'''
data = pd.read_csv('food_info.csv')
print(type(data))
print(data.dtypes)
print(help(pd.read_csv))

<class 'pandas.core.frame.DataFrame'>
NDB_No               int64
Shrt_Desc           object
Water_(g)          float64
Energ_Kcal           int64
Protein_(g)        float64
Lipid_Tot_(g)      float64
Ash_(g)            float64
Carbohydrt_(g)     float64
Fiber_TD_(g)       float64
Sugar_Tot_(g)      float64
Calcium_(mg)       float64
Iron_(mg)          float64
Magnesium_(mg)     float64
Phosphorus_(mg)    float64
Potassium_(mg)     float64
Sodium_(mg)        float64
Zinc_(mg)          float64
Copper_(mg)        float64
Manganese_(mg)     float64
Selenium_(mcg)     float64
Vit_C_(mg)         float64
Thiamin_(mg)       float64
Riboflavin_(mg)    float64
Niacin_(mg)        float64
Vit_B6_(mg)        float64
Vit_B12_(mcg)      float64
Vit_A_IU           float64
Vit_A_RAE          float64
Vit_E_(mg)         float64
Vit_D_mcg          float64
Vit_D_IU           float64
Vit_K_(mcg)        float64
FA_Sat_(g)         float64
FA_Mono_(g)        float64
FA_Poly_(g)        float64
Cholestrl_(mg)     float64
dtype: object
Help on function read_csv in module pandas.io.parsers:

read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
    Read CSV (comma-separated) file into DataFrame
    
    Also supports optionally iterating or breaking of the file
    into chunks.
    
    Additional help can be found in the `online docs for IO Tools
    <http://pandas.pydata.org/pandas-docs/stable/io.html>`_.
    
    Parameters
    ----------
    filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any \
    object with a read() method (such as a file handle or StringIO)
        The string could be a URL. Valid URL schemes include http, ftp, s3, and
        file. For file URLs, a host is expected. For instance, a local file could
        be file://localhost/path/to/table.csv
    sep : str, default ','
        Delimiter to use. If sep is None, the C engine cannot automatically detect
        the separator, but the Python parsing engine can, meaning the latter will
        be used and automatically detect the separator by Python's builtin sniffer
        tool, ``csv.Sniffer``. In addition, separators longer than 1 character and
        different from ``'\s+'`` will be interpreted as regular expressions and
        will also force the use of the Python parsing engine. Note that regex
        delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``
    delimiter : str, default ``None``
        Alternative argument name for sep.
    delim_whitespace : boolean, default False
        Specifies whether or not whitespace (e.g. ``' '`` or ``'\t'``) will be
        used as the sep. Equivalent to setting ``sep='\s+'``. If this option
        is set to True, nothing should be passed in for the ``delimiter``
        parameter.
    
        .. versionadded:: 0.18.1 support for the Python parser.
    
    header : int or list of ints, default 'infer'
        Row number(s) to use as the column names, and the start of the
        data.  Default behavior is to infer the column names: if no names
        are passed the behavior is identical to ``header=0`` and column
        names are inferred from the first line of the file, if column
        names are passed explicitly then the behavior is identical to
        ``header=None``. Explicitly pass ``header=0`` to be able to
        replace existing names. The header can be a list of integers that
        specify row locations for a multi-index on the columns
        e.g. [0,1,3]. Intervening rows that are not specified will be
        skipped (e.g. 2 in this example is skipped). Note that this
        parameter ignores commented lines and empty lines if
        ``skip_blank_lines=True``, so header=0 denotes the first line of
        data rather than the first line of the file.
    names : array-like, default None
        List of column names to use. If file contains no header row, then you
        should explicitly pass header=None. Duplicates in this list will cause
        a ``UserWarning`` to be issued.
    index_col : int or sequence or False, default None
        Column to use as the row labels of the DataFrame. If a sequence is given, a
        MultiIndex is used. If you have a malformed file with delimiters at the end
        of each line, you might consider index_col=False to force pandas to _not_
        use the first column as the index (row names)
    usecols : list-like or callable, default None
        Return a subset of the columns. If list-like, all elements must either
        be positional (i.e. integer indices into the document columns) or strings
        that correspond to column names provided either by the user in `names` or
        inferred from the document header row(s). For example, a valid list-like
        `usecols` parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Element
        order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]``.
        To instantiate a DataFrame from ``data`` with element order preserved use
        ``pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]`` for columns
        in ``['foo', 'bar']`` order or
        ``pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]``
        for ``['bar', 'foo']`` order.
    
        If callable, the callable function will be evaluated against the column
        names, returning names where the callable function evaluates to True. An
        example of a valid callable argument would be ``lambda x: x.upper() in
        ['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster
        parsing time and lower memory usage.
    squeeze : boolean, default False
        If the parsed data only contains one column then return a Series
    prefix : str, default None
        Prefix to add to column numbers when no header, e.g. '