为什么‘category‘类型的变量，在经过.to_dict()转换后变成dtype(‘O‘)？

本文链接：https://blog.youkuaiyun.com/weixin_62726761/article/details/145670576

当 category 类型的数据使用 .to_dict() 转换时，它会变成 dtype('O')（即 object 类型），原因如下：

📌 1️⃣ `.to_dict()` 的工作原理

Pandas 的 .to_dict() 方法会将 DataFrame 或 Series 转换为 Python 原生数据结构（字典）。
但是，Pandas 的 category 类型不会保留，而是会转换为 object 类型。

import pandas as pd

# 创建一个 DataFrame，并将 'education' 列转换为 category 类型
df = pd.DataFrame({'education': ['Bachelor', 'Master', 'PhD', 'Bachelor']})
df['education'] = df['education'].astype('category')

print(df.dtypes)
# 输出：
# education    category
# dtype: object

# 使用 .to_dict() 转换
dtype_dict = df.dtypes.to_dict()
print(dtype_dict)
# 输出：
# {'education': dtype('O')}

🚨 category 变成了 dtype('O')，即 object 类型！

📌 2️⃣ 为什么 `category` 变成 `dtype('O')`？

🔍 原因：Pandas 不能直接序列化 `category`

category 类型是 Pandas 特有的优化数据类型，而 Python 的原生字典不支持存储 category 类型。
to_dict() 只能存储基本数据类型（如 int、float、object）。
Pandas 自动将 category 视为 object，因为它本质上存储的是字符串。

dtype('O') 代表的是 object 类型，通常用于存储字符串。

📌 3️⃣ 如何保留 `category` 类型信息？

如果你想在 .to_dict() 后仍然保留 category 类型，可以 手动转换回 category：

# 使用 .astype() 重新转换为 category
df['education'] = df['education'].astype('category')

或者，你可以在字典中额外存储数据类型信息：

# 额外存储数据类型
dtype_dict = {col: str(df[col].dtype) for col in df.columns}
print(dtype_dict)
# 输出：
# {'education': 'category'}

这样就不会丢失 category 类型的信息。

📌 4️⃣ 结论

Pandas 的 .to_dict() 方法不会保留 category 类型，它会自动转换为 dtype('O')（object）。
这是因为 Python 的原生字典不支持 Pandas 的 category 类型。
如果你想保留 category 信息，可以 手动存储数据类型 或 手动转换回 category 之后使用。

为什么‘category‘类型的变量，在经过.to_dict()转换后变成dtype(‘O‘)？

📌 1️⃣ .to_dict() 的工作原理

📌 2️⃣ 为什么 category 变成 dtype('O')？

🔍 原因：Pandas 不能直接序列化 category

📌 3️⃣ 如何保留 category 类型信息？

📌 4️⃣ 结论

📌 1️⃣ `.to_dict()` 的工作原理

📌 2️⃣ 为什么 `category` 变成 `dtype('O')`？

🔍 原因：Pandas 不能直接序列化 `category`

📌 3️⃣ 如何保留 `category` 类型信息？