One-hot编码：Python sklearn CTR实验

最新推荐文章于 2024-02-29 10:21:01 发布

原创最新推荐文章于 2024-02-29 10:21:01 发布 · 4.6k 阅读

1 ·

CC 4.0 BY-SA版权

Python 专栏收录该内容

13 篇文章

订阅专栏

本文详细介绍了如何使用Python进行数据预处理，特别关注了如何将元组类型数据转换为NumPy数组，并通过One-Hot编码将其转换为类别特征矩阵。包括数据从MySQL数据库读取、数据类型转换、One-Hot编码过程、编码结果写入文本文件的完整流程。

import numpy as np
from sklearn.preprocessing import OneHotEncoder
from numpy import *
import MySQLdb


conn = MySQLdb.connect(host='localhost', user='root', passwd='Zhouy2008', port = 3306)
cursor = conn.cursor()


# 选择该数据库
conn.select_db('ml_test')


# 查询one_hot数据表中的数据
sql = "select * from one_hot"
cursor.execute(sql)
data = cursor.fetchall()


print 'now, transfrom data type from tuple to ndarray'
print 'data_arr:\n'
data_arr = np.array(data)
for i in data_arr:
    print i


# one-hot编码过程
enc = OneHotEncoder()
enc.fit(data_arr)
data_hoted = enc.transform(data_arr).toarray()
print 
print 'enc.transform(all).toarray():\n', enc.transform(data_arr).toarray()


# 如何将data_hoted（数组形式）写入文本
num_rows, num_cols = shape(data_hoted)
print '写入文本'
f = open("G:/one_hoted.txt", "w")
for i in range(num_rows):
        print >>f, data_hoted[i,:]
f.close()
print '写入文本完成'