Load Data From CSV

本文介绍了如何从CSV文件加载数据集,包括处理空行、将字符串数值转换为浮点数和整数的方法。通过具体代码示例展示了数据预处理过程,如读取CSV文件、检查数据维度以及对数据进行类型转换。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

chap1

从csv中加载数据集

加载csv文件

# Load a CSV file 
def load_csv(filename):
 file = open(filename, "r")
 lines = reader(file) 
 dataset = list(lines) 
 return dataset

其中,len(dataset)表示行数, len(dataset[0]))表示列数。
上述方法有局限性,可能会引入空行,下面通过一行一行的引入来解决:

from csv import reader

# Load a CSV file def 
load_csv(filename): 
	dataset = list() 
	with open(filename, 'r') as file:
	 	csv_reader = reader(file) 
	 	for row in csv_reader: 
	 		if not row: 
	 		continue 
	 	dataset.append(row)
	 return dataset
	 
# Load dataset 
filename = 'pima-indians-diabetes.csv'
dataset = load_csv(filename) 
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename, len(dataset), len(dataset[0])))

将字符型数值转化成浮点型

def str_column_to_float(dataset, column):
	for row in dataset: 
		row[column] = float(row[column].strip())

eg:
[‘6’, ‘148’, ‘72’, ‘35’, ‘0’, ‘33.6’, ‘0.627’, ‘50’, ‘1’]
转化成
[6.0, 148.0, 72.0, 35.0, 0.0, 33.6, 0.627, 50.0, 1.0]

将字符转换成整型

# Convert string column to integer 
def str_column_to_int(dataset, column):
	class_values = [row[column] for row in dataset] 
	unique = set(class_values) #筛选种类
	lookup = dict() 
	for i, value in enumerate(unique): 
 		lookup[value] = i #字典写入
 	for row in dataset: #修改数值
 		row[column] = lookup[row[column]]
    return lookup

总结

# Example of integer encoding string class values 
from csv import reader
# Load a CSV file 
def load_csv(filename): 
	dataset = list() 
	with open(filename, 'r') as file:
		 csv_reader = reader(file) 
	for row in csv_reader: 
 		if not row: 
 			continue 
	 dataset.append(row) 
return dataset
 
# Convert string column to float 
def str_column_to_float(dataset, column):
	for row in dataset: 
		row[column] = float(row[column].strip())

# Convert string column to integer 
def str_column_to_int(dataset, column):
 class_values = [row[column] 
 for row in dataset] 
 	unique = set(class_values) 
	 lookup = dict() 
 for i, value in enumerate(unique):
 	 lookup[value] = i 
 for row in dataset: 
  	row[column] = lookup[row[column]]
 return lookup
# Load iris dataset 
filename = 'iris.csv' 
dataset = load_csv(filename)
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename, len(dataset), len(dataset[0]))) 
print(dataset[0])
 # convert string columns to float
for i in range(4):
    str_column_to_float(dataset, i)
# convert class column to int
lookup = str_column_to_int(dataset, 4) 
print(dataset[0]) 
print(lookup)

结果如下:

Loaded data file iris.csv with 150 rows and 5 columns 
['5.1', '3.5', '1.4', '0.2', 'Iris-setosa'] 
[5.1, 3.5, 1.4, 0.2, 1] 
{'Iris-virginica': 0, 'Iris-setosa': 1, 'Iris-versicolor': 2}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值