【机器学习系列(2)】监督学习实战:算法详解与模型评估
一、监督学习核心算法
1. 线性回归(预测连续值)
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# 数据集示例
X = [[1], [2], [3], [4]] # 房屋面积
y = [100, 200, 300, 400] # 房价(万元)
model = LinearRegression()
model.fit(X, y)
pred = model.predict([[5]])
print(f"预测5平米的房价:{pred[0]:.1f}万元")
2. 决策树(分类与回归)
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
# 加载鸢尾花数据集
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target)
# 训练模型
clf = DecisionTreeClassifier(max_depth=3)
clf.fit(X_train, y_train)
print(f"测试集准确率:{clf.score(X_test, y_test):.2%}")
3. 支持向量机(SVM)
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
# 加载乳腺癌数据集
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target)
# 训练模型
svm = SVC(kernel='rbf', C=1.0)
svm.fit(X_train, y_train)
print(f"SVM准确率:{svm.score(X_test, y_test):.2%}")
二、特征工程实战技巧
1. 缺失值处理
from sklearn.impute import SimpleImputer
import numpy as np
# 构造含缺失值数据
X = [[1, 2], [np.nan, 3], [7, 6]]
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(X)
2. 特征缩放
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform([[30], [20], [25], [35]])
3. 类别特征编码
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
colors = [['红'], ['蓝'], ['绿'], ['红']]
encoded = encoder.fit_transform(colors).toarray()
4. 特征选择
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
selector = SelectKBest(f_classif, k=5)
X_new = selector.fit_transform(X_train, y_train)
三、模型评估关键指标
1. 分类任务指标
from sklearn.metrics import classification_report
y_true = [0, 1, 0, 1]
y_pred = [0, 1, 1, 0]
print(classification_report(y_true, y_pred))
# 输出包含:
# - 精确率(Precision)
# - 召回率(Recall)
# - F1-Score
# - 支持数(Support)
2. 回归任务指标
from sklearn.metrics import r2_score, mean_absolute_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print(f"R²分数:{r2_score(y_true, y_pred):.2f}")
print(f"MAE:{mean_absolute_error(y_true, y_pred):.2f}")
3. 交叉验证
from sklearn.model_selection import cross_val_score
scores = cross_val_score(estimator=clf,
X=X_train,
y=y_train,
cv=5,
scoring='accuracy')
print(f"交叉验证平均得分:{scores.mean():.2%}")
四、下期预告
《机器学习系列(3)》将深入探讨:
- 无监督学习的核心算法(K-Means/PCA)
- 数据降维实战技巧
- 聚类质量评估方法
实践建议:
- 运行代码前安装依赖:
pip install scikit-learn numpy pandas - 尝试修改代码中的超参数观察结果变化
- 在Jupyter Notebook中分步执行代码片段

被折叠的 条评论
为什么被折叠?



