类别转化为数字,有三种方法,第一种可以转化为one-hot类型:
data=pd.DataFrame({"level":["low","high","medium","high"],"age":[14,33,24,35]})
print(pd.get_dummies(data))
# age level_high level_low level_medium
# 0 14 0 1 0
# 1 33 1 0 0
# 2 24 0 0 1
# 3 35 1 0 0
第二种可以自动转化为数字,但是数字之间不存在逻辑上的联系,例如“medium”的取值应该在“low”和“high”之间,但是下面的medium取值反而大于“low”和“high”:
data=pd.DataFrame({"level":["low","high","medium","high"],"age":[14,33,24,35]})
data["level"]=data["level"].astype("category").cat.codes+1
print(data)
# level age
# 0 2 14
# 1 1 33
# 2 3 24
# 3 1 35
第三种可以制定转化后,类别与数字的对应关系,例如让“high”变为3,让“low”变成1:
data=pd.DataFrame({"level":["low","high","medium","high"],"age":[14,33,24,35]})
level_map={"low":1,"medium":2,"high":3}
data["level"]=data["level"].map(level_map)
print(data)
# level age
# 0 1 14
# 1 3 33
# 2 2 24
# 3 3 35