python机器学习基础笔记3之加载数据（cook book）_features = digits.data 出错-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_43702920/article/details/95177310

博客主要介绍了使用Python进行机器学习时加载数据集的相关内容，涵盖了CSV文件（包括网络URL和本地文件）、EXCEL、JSON文件以及SQL数据库访问等不同格式数据集的加载方式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Loading datasets

# Load scikit-learn's datasets
from sklearn import datasets

# Load digits dataset(手写数字数据集)
digits = datasets.load_digits()

# Create features matrix
features = digits.data

# Create target vector
target = digits.target
# View first observation
features[0]

部分数据集：

load_boston
Contains 503 observations on Boston housing prices. It is a good dataset for
exploring regression algorithms.
load_iris
Contains 150 observations on the measurements of Iris flowers. It is a good data‐
set for exploring classification algorithms.
load_digits
Contains 1,797 observations from images of handwritten digits. It is a good data‐
set for teaching image classification.

CSV file

网络上url :

# Load library
import pandas as pd

# Create URL
url = 'https://tinyurl.com/simulated_data'

# Load dataset
dataframe = pd.read_csv(url)

# View first two rows
dataframe.head(2)

本地 file:

dataframe = pd.read_csv(r'path')

EXCEL

# Load library
import pandas as pd

# Create URL
url = 'https://tinyurl.com/simulated_excel'

# Load data
dataframe = pd.read_excel(url, sheetname=0, header=1)

# View the first two rows
dataframe.head(2)

# ps： sheetname can accept both strings containing the name of the sheet and
integers pointing to sheet positions (zero-indexed). If we need to load multiple sheets,
include them as a list. For example, sheetname=[0,1,2, "Monthly Sales"] will
return a dictionary of pandas DataFrames containing the first, second, and third
sheets and the sheet named Monthly Sales.

JSON file

# Load library
import pandas as pd

# Create URL
url = 'https://tinyurl.com/simulated_json'

# Load data
dataframe = pd.read_json(url, orient='columns')

# View the first two rows
dataframe.head(2)

注意： orient parameter, which indicates to pandas how the JSON file
is structured. However, it might take some experimenting to figure out which argu‐
ment (split, records, index, columns, and values) is the right one. Another helpful
tool pandas offers is json_normalize, which can help convert semistructured JSON
data into a pandas DataFrame.

SQL 数据库访问

# Load libraries
import pandas as pd
from sqlalchemy import create_engine

# Create a connection to the database
database_connection = create_engine('sqlite:///sample.db')

# Load data
dataframe = pd.read_sql_query('SELECT * FROM data', database_connection)

# View first two rows
dataframe.head(2)