离散特征的转化LabelEncoder和OneHotEncoder

本文介绍了一种使用Python进行特征编码的方法,包括标签编码(label encoding)和独热编码(one-hot encoding),适用于机器学习预处理阶段的数据转换。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings


def change_label_encoding(Inputfile_name, Outputfile_name):
    # discrete features transformation
    dis_feature = pd.read_csv(Inputfile_name)
    
    categorical_feat=[]
    for col in dis_feature.columns.values:
        if(dis_feature[col].dtypes=='object'):
            categorical_feat.append(col)
        else:
            print('col:',col)
    #label transformation
    for i in categorical_feat:
        le = LabelEncoder()
        le.fit(dis_feature[i])
        dis_feature[i] =le.transform(dis_feature[i])
        
    dis_feature.to_csv(Outputfile_name)
    

def change_onehot_encoding(Inputfile_name, Outputfile_name, numeric_feat):
    #read file
    onehot_feature = pd.read_csv(Inputfile_name)
    
    #one-hot encoding
    for i in numeric_feat:
        the_cate_col = pd.get_dummies(onehot_feature[i], prefix=i, drop_first=True)
        train = pd.concat((onehot_feature, the_cate_col), axis=1)
        train.pop(i)
    onehot_feature.to_csv(Outputfile_name)
    
def onehot_encoding(Inputfile_name, Outputfile_name, numeric_feat):
    #read file
    onehot_feature = pd.read_csv(Inputfile_name)
    encode_onehot = []
    #one-hot transformation
    for i in numeric_feat:
            dis_feature = onehot_feature[i].values.tolist()
            enc = OneHotEncoder()
            enc.fit(dis_feature)
            b_dis = enc.transform(dis_feature).toarray()
            encode_onehot.append(b_dis)
    return encode_onehot
if __name__ == "__main__":
 
    change_label_encoding('kddlabel0-2+.csv','kddlabel0-3+.csv')

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值