Python:基于类对numy array类型数据预处理的实现(标准化、归一化、均值化、min-max等)

        将自己写过的一些代码和功能整理一下,数据科学这块首先想做的就是关于数据的预处理,通过类的方式可以比较轻松地整合诸多数据预处理的方法,以下为源码。


import numpy as np
import pandas as pd

"""This file is used to do data pre-processing.
The class DataPreProcessingForNumpyArray contains functions ad shown below:
standarlize:标准化(Z-score)
noemalization:0-1化
equalization:均值化
min_max_negation:min-max负值化"""

class DataProcessingForNumpyArray:
    def __init__(self,data):
        """To store data for being processed"""
        self.data = data
        (self.r, self.c) = self.data.shape

    """Standardlize the data"""
    def standarlization(self, r_or_c = 0, selected = None):
        """Arguments:
        r_or_c: int determines whether the function processes data by rows or columns,
            0 refers to rows and 1 refers to columns, 0 is the default value
        selected: list determines which rows or columns to be processed
            if not defiend, the function will process all data
        """
        def Z_Score(data):
            mean = np.mean(data)
            std_dev = np.std(data)
            norm_data = (data - mean) / std_dev
            return norm_data
        if selected == None:
            selected = [i for i in range(0,self.data.shape[r_or_c])]
        if r_or_c == 0:
            for i in selected:
                data = self.data[i,:]
                self.data[i,:] = Z_Score(data)
        elif r_or_c == 1:
            for j in selected:
                data = self.data[:,j]
                self.data[:,j] = Z_Score(data)

    """Normalize the data"""
    def normalization(self, r_or_c = 0, selected = None):
        """Arguments:
        r_or_c: int determines whether the function processes data by rows or columns,
            0 refers to rows and 1 refers to columns, 0 is the default value
        selected: list determines which rows or columns to be processed
            if not defiend, the function will process all data
        """
        def unil(data):
            data = (data - np.min(data)) / (np.max(data) - np.min(data))
            return data
        if selected == None:
            selected = [i for i in range(0,self.data.shape[r_or_c])]
        if r_or_c == 0:
            for i in selected:
                data = self.data[i,:]
                self.data[i,:] = unil(data)
        elif r_or_c == 1:
            for j in selected:
                data = self.data[:,j]
                self.data[:,j] = unil(data)

    """Equalize the data"""
    def equalization(self, r_or_c = 0, selected = None):
        """Arguments:
        r_or_c: int determines whether the function processes data by rows or columns,
            0 refers to rows and 1 refers to columns, 0 is the default value
        selected: list determines which rows or columns to be processed
            if not defiend, the function will process all data
        """
        if selected == None:
            selected = [i for i in range(0,self.data.shape[r_or_c])]
        if r_or_c == 0:
            for i in selected:
                data = self.data[i,:]
                self.data[i,:] = self.data[i,:]/np.mean(data)
        elif r_or_c == 1:
            for j in selected:
                data = self.data[:,j]
                self.data[:,j] = self.data[:,j]/np.mean(data)

    """Negatize """
    def min_max_negation(self,r_or_c =0,selected = None):
        """Arguments:
        r_or_c: int determines whether the function processes data by rows or columns,
            0 refers to rows and 1 refers to columns, 0 is the default value
        selected: list determines which rows or columns to be processed
            if not defiend, the function will process all data
        """
        if selected == None:
            selected = [i for i in range(0, self.data.shape[r_or_c])]
        if r_or_c == 0:
            for i in selected:
                data = self.data[i,:]
                ma = np.max(data)
                mi = np.min(data)
                self.data[i,:] = ma - data + mi
        elif r_or_c == 1:
            for j in selected:
                data = self.data[:,j]
                self.data[:,j] = self.data[:,j]/np.mean(data)

    """Save data as an excel file"""
    def save_data(self,filename,savepath = None):
        """filename: a string end with ".xlsx"
        savepath: an availible system path"""
        df = pd.DataFrame(self.data)
        if not savepath:
            df.to_excel(filename)
        else:
            df.to_excel(savepath + filename)

        该类一共有五个方法——四个是数据预处理的方法,一个是用于保存数据,四种预处理方法的具体功能根据其名字就能看出,具体使用方法是先实例化该类,然后调用方法,如果需要的话可以保存。

第一步,实例化该类需要传入type为numpy array且元素类型均为int或float的对象。

第二步,调用该实例下的四种预处理方法,它们有一致的参数:r_or_c为0或1的整型数据,指代是以行或列处理数据,0代表将一行视为一个指标进行处理,1代表将一列视为一个指标进行处理;selected应当是list,是选填选项,指代需要处理的行或列,若不指定则默认处理所有行或列。

第三步,保存数据必填参数为以".xlsx"为结尾的文件名,选填保存路径,若不填保存路径则默认保存在程序所在的文件夹下。

使用例,其中,”data.xlsx“为如下图所示的数据表格:

save_path = 'C:\\Users\\Technicolor\\Desktop\\pythonProject\\test\\datas\\'
Data = pd.read_excel("data.xlsx")
data = Data.values[:,1:]
dt = DataProcessingForNumpyArray(data)
dt.equalization(r_or_c=1)
dt.save_data(filename="test.xlsx",savepath=save_path)

在MATLAB中,您可以使用RAW工具来可视化RAW数据。RAW是一个简单而强大的数据可视化工具,专为设计师和数据科学家设计。您可以使用RAW将数据转换成美丽的图形,而无需编写任何代码。RAW支持多种图表类型,例如柱状图、饼图和地图。您可以轻松地导入自己的数据,并创建交互式图形。使用RAW,您可以快速地将数据可视化,并与他人分享。\[1\] 另外,您还可以使用MATLAB自带的函数来读取和显示RAW数据。您可以使用fread函数来读取RAW文件,并使用slice函数来显示数据的切片。以下是一个示例代码,演示了如何读取和显示RAW数据: ```matlab \[filename, filepath\] = uigetfile('*.raw', 'Select image to view'); handles.imagefilename = filename; handles.imagefilepath = filepath; fullpath = \[filepath filename\]; NumX=256; NumY=256; NumZ=220; VoxelDataType ='short'; fid = fopen(fullpath); V = fread(fid,\[NumX*NumY NumZ\],VoxelDataType); V = reshape(V,\[NumX NumY NumZ\]); fclose(fid); h = slice(V, \[\], \[\], 1:1:size(V,3)); set(h, 'EdgeColor','none', 'FaceColor','interp') alpha(.1) ``` 这段代码首先使用uigetfile函数让用户选择要查看的RAW文件。然后,它使用fread函数读取RAW文件中的数据,并使用reshape函数将数据重新整形为三维数组。最后,它使用slice函数显示数据的切片,并设置一些显示参数。 另外,如果您想使用其他数据可视化工具,您还可以考虑使用Plotly。Plotly是一个强大的数据可视化工具,支持多种类型的图表,并且可以在网页和交互式图表中创建动态的、高质量的图形。Plotly还支持多种编程语言,包括MATLAB,使得在不同的应用环境中使用变得更加便捷。\[3\] #### 引用[.reference_title] - *1* *3* [2023年大数据的新前线:20个最佳可视化工具,让你的数据呈现更加炫酷](https://blog.youkuaiyun.com/qq_20288327/article/details/122883801)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down28v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [MATLAB: 3D raw 数据可视化](https://blog.youkuaiyun.com/Jelly_Zhou/article/details/126086712)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down28v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值