一个bug卡了我好几天,写个文档记录一下处理过程,我个人思考的也不是很清楚,仅用做记录,希望日后能更深入的解决
问题产生:
训练随机森林等算法
df.head()
打印结果
track_id track_name popularity duration_ms explicit artists artists_id release_date danceability energy ... r&b mandopop japanese rap hong indie kong house singer songwriter 0 5KpWHEh32vzxkttIK3KHKI 國際孤獨等級 51 193747 False ['Gareth.T'] 6R57JlNKlnNrYaji0vw8xx 2023-03-03 0.692 0.189 ... 0 0 0 0 1 0 1 0 0 0 1 1sb71AvysPMJlsx4qYtTpG 緊急聯絡人 58 222668 False ['Gareth.T'] 6R57JlNKlnNrYaji0vw8xx 2023-11-30 0.513 0.373 ... 0 0 0 0 1 0 1 0 0 0 2 2mMgDVazhRjNoOweYMP1pz 青春告別式 50 256967 False ['Hins Cheung'] 2MVfNjocvNrE03cQuxpsWK 2023-12-31 0.433 0.380 ... 0 0 0 0 0 0 0 0 0 0 3 6UuJk5rvrxSnOAwv6uSr5b 給你幸福 所以幸福 51 244693 False ['Jay Fung'] 4EXI1ieJe2VDbvNsKOaNQL 2023-10-24 0.414 0.456 ... 0 0 0 0 0 0 0 0 0 0 4 1mUhvuqX0ScGodDTdnRtuL 永久損毀 49 232002 False ['MC 張天賦', 'Panther Chan'] 5tRk0bqMQubKAVowp35XtC 2023-12-19 0.514 0.405 ... 0 0 0 0 0 0 0 0 0 0 5 rows × 45 columns
以上是数据内容
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4, random_state=0)
print(x_train.columns.tolist())
print(x_train.columns.dtype)
#TODO
x_train= x_train.rename(str,axis="columns")
# 决策树模型
dt_model = DecisionTreeRegressor(random_state=0)
dt_model.fit(x_train, y_train)
y_pred_dt = dt_model.predict(x_test)
在dt_model.fit(x_train, y_train)报错,
TypeError: Feature names are only supported if all input features have stringAsk Qnames, but your input has ['str', 'str_'] as column name types
解决思路一
对数据类型进行转换,
x.columns = x.columns.astype(str)
print(x.columns.dtype)
print(x.columns.tolist())
for col_name in x.columns:
assert isinstance(col_name, str)
其实这里核心的是x.columns = x.columns.astype(str),一句就够了
但是经过上述代码,column的类型已经变成object,即字符串型,但是在上面的dt_model.fit(x_train, y_train)仍然报错相同,这里的失败原因尚不可知
解决思路二
x_train= x_train.rename(str,axis="columns")
换了一种解决方法在StackOverflow中查到的,没想到一下子就好了,原理和方法一类似,我还没找到为什么这个可以