pandas.read_excel详解

本文介绍如何利用Python的Pandas库来读取Excel文件,并详细解释了read_excel函数的参数配置,包括指定读取特定工作表、跳过标题行等功能。
部署运行你感兴趣的模型镜像
#coding:utf-8
import pandas as pd
import numpy as np

filefullpath = r"/home/geeklee/temp/all_gov_file/pol_gov_mon/downloads/1.xls"
#filefullpath = r"/home/geeklee/temp/all_gov_file/pol_gov_mon/downloads/26368f3a-ea03-46b9-8033-73615ed07816.xls"
df = pd.read_excel(filefullpath,skiprows=[0])
#df = pd.read_excel(filefullpath, sheetname=[0,2],skiprows=[0])
#sheetname指定为读取几个sheet,sheet数目从0开始
#如果sheetname=[0,2],那代表读取第0页和第2页的sheet
#skiprows=[0]代表读取跳过的行数第0行,不写代表不跳过标题
#df = pd.read_excel(filefullpath, sheetname=None ,skiprows=[0])

print df
print type(df)
#若果有多页,type(df)就为<type 'dict'>
#如果就一页,type(df)就为<class 'pandas.core.frame.DataFrame'>
#{0:dataframe,1:dataframe,2:dataframe}
pandas.read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None, names=None, parse_cols=None, parse_dates=False, date_parser=None, na_values=None, thousands=None, convert_float=True, has_index_names=None, converters=None, engine=None, squeeze=False, **kwds)

Read an Excel table into a pandas DataFrame


参数解析:

io : string, path object (pathlib.Path or py._path.local.LocalPath),

    file-like object, pandas ExcelFile, or xlrd workbook. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file://localhost/path/to/workbook.xlsx

sheetname : string, int, mixed list of strings/ints, or None, default 0

    Strings are used for sheet names, Integers are used in zero-indexed sheet positions.

    Lists of strings/integers are used to request multiple sheets.

    Specify None to get all sheets.

    str|int -> DataFrame is returned. list|None -> Dict of DataFrames is returned, with keys representing sheets.

    Available Cases

        Defaults to 0 -> 1st sheet as a DataFrame
        1 -> 2nd sheet as a DataFrame
        “Sheet1” -> 1st sheet as a DataFrame
        [0,1,”Sheet5”] -> 1st, 2nd & 5th sheet as a dictionary of DataFrames
        None -> All sheets as a dictionary of DataFrames

header : int, list of ints, default 0

    Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex

skiprows : list-like

    Rows to skip at the beginning (0-indexed)

skip_footer : int, default 0

    Rows at the end to skip (0-indexed)

index_col : int, list of ints, default None

    Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column. If a list is passed, those columns will be combined into a MultiIndex

names : array-like, default None

    List of column names to use. If file contains no header row, then you should explicitly pass header=None

converters : dict, default None

    Dict of functions for converting values in certain columns. Keys can either be integers or column labels, values are functions that take one input argument, the Excel cell content, and return the transformed content.

parse_cols : int or list, default None

        If None then parse all columns,
        If int then indicates last column to be parsed
        If list of ints then indicates list of column numbers to be parsed
        If string then indicates comma separated list of column names and column ranges (e.g. “A:E” or “A,C,E:F”)

squeeze : boolean, default False

    If the parsed data only contains one column then return a Series

na_values : list-like, default None

    List of additional strings to recognize as NA/NaN

thousands : str, default None

    Thousands separator for parsing string columns to numeric. Note that this parameter is only necessary for columns stored as TEXT in Excel, any numeric columns will automatically be parsed, regardless of display format.

keep_default_na : bool, default True

    If na_values are specified and keep_default_na is False the default NaN values are overridden, otherwise they’re appended to

verbose : boolean, default False

    Indicate number of NA values placed in non-numeric columns

engine: string, default None

    If io is not a buffer or path, this must be set to identify io. Acceptable values are None or xlrd

convert_float : boolean, default True

    convert integral floats to int (i.e., 1.0 –> 1). If False, all numeric data will be read in as floats: Excel stores all numbers as floats internally

has_index_names : boolean, default None

    DEPRECATED: for version 0.17+ index names will be automatically inferred based on index_col. To read Excel output from 0.16.2 and prior that had saved index names, use True.

return返回的结果

parsed : DataFrame or Dict of DataFrames

    DataFrame from the passed in Excel file. See notes in sheetname argument for more information on when a Dict of Dataframes is returned.

您可能感兴趣的与本文相关的镜像

Python3.10

Python3.10

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

<think> 根据用户需求,用户希望获取pandas库中pd.read_excel函数的详细解释。结合提供的引用内容,我将重点介绍该函数的功能、常用参数和使用示例。 引用[1][2][3]都详细介绍了pd.read_excel函数,因此我将整合这些信息,并按照以下结构进行说明: 1. 函数概述 2. 常用参数详解 3. 示例代码 4. 注意事项 注意:由于引用中提到了该函数支持读取xls和xlsx格式,以及如何读取多个工作表等,这些信息将包含在参数详解中。 另外,引用[4]是关于pd.to_datetime()的,与当前问题无关,故忽略。 下面开始组织回答: </think>### pandas库中`pd.read_excel()`函数详解 `pd.read_excel()`是pandas库中用于读取Excel文件的核心函数,支持`.xls`和`.xlsx`格式,可将Excel数据转换为DataFrame格式进行处理。以下是关键参数和用法详解: #### **一、核心参数解析** 1. **`io`** (必需) - Excel文件路径或类文件对象 - 示例:`pd.read_excel("data/sales.xlsx")` - 支持URL路径(如`https://example.com/data.xlsx`)[^3] 2. **`sheet_name`** - 指定读取的工作表: - 默认`0`(第一个工作表) - 可传工作表名称(`"Sheet1"`) - 传`None`读取所有工作表→返回`{表名: DataFrame}`字典[^2] - 示例:`sheet_name=["订单", "库存"]` 读取指定多个表 3. **`header`** 和 **`names`** - `header`:指定列名行(默认`0`即首行) - `names`:自定义列名列表(如`names=["ID","产品","价格"]`)[^3] 4. **`usecols`** - 选择特定列: - 列字母范围:`"A:C"`(A列到C列) - 列索引列表:`[0,2,4]` - 函数筛选:`usecols=lambda x: "单价" in x`[^2] 5. **`dtype`** - 强制列数据类型: - 示例:`dtype={"ID": str, "数量": float}` 防止数值ID被转为整数[^3] 6. **`na_values`** - 自定义缺失值标识: - 示例:`na_values=["N/A", "缺失"]` 将这些值转为NaN[^2] #### **二、特殊场景处理** 1. **读取不规则表格** - `skiprows`:跳过前N行(如标题下方有空白行) - `skipfooter`:跳过末尾N行(如备注信息) - 示例:`skiprows=3, skipfooter=2`[^3] 2. **处理大型文件** - `chunksize`:分块读取(返回迭代器) ```python for chunk in pd.read_excel("large_data.xlsx", chunksize=1000): process(chunk) # 逐块处理 ``` 3. **解析日期列** - `parse_dates`:自动解析日期列 - 示例:`parse_dates=["下单日期"]`[^4] #### **三、完整示例** ```python import pandas as pd # 读取指定工作表,跳过前2行,自定义列名,筛选列 df = pd.read_excel( io="data/2023_sales.xlsx", sheet_name="月度汇总", skiprows=2, names=["日期", "产品", "销量", "销售额"], usecols="B:E", dtype={"销量": int}, na_values=["缺货"] ) print(df.head()) ``` #### **四、常见问题解决** - **路径转义问题**:使用原始字符串或双反斜杠 ```python df = pd.read_excel(r"C:\data\file.xlsx") # 推荐 # 或 df = pd.read_excel("C:\\data\\file.xlsx") ``` - **编码问题**:通过`encoding`参数指定(如`encoding="gbk"`)[^3] - **公式读取**:默认读取公式计算结果,需原始公式时用`openpyxl`库直接操作 > **提示**:复杂Excel建议先用`pd.ExcelFile`创建对象再解析,可复用连接: > ```python > with pd.ExcelFile("data.xlsx") as xls: > df1 = pd.read_excel(xls, "Sheet1") > df2 = pd.read_excel(xls, "Sheet2") > ```
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值