一、人工智能概述:定义、历史与现代格局
1.1 人工智能的定义与分类
人工智能(Artificial Intelligence, AI)是指由人类创造的、能够模拟、延伸和扩展人类智能的理论、方法、技术和应用系统。
1.1.1 AI的三大分类
| 类型 | 定义 | 特点 | 现实状态 |
|---|---|---|---|
| 弱人工智能(Narrow AI) | 在特定任务上表现优异 | 专注单一领域,无通用智能 | 当前主流(99%的AI应用) |
| 强人工智能(General AI) | 具备人类水平的通用智能 | 能理解、学习、推理任何任务 | 理论阶段(尚未实现) |
| 超人工智能(Super AI) | 超越人类智能水平 | 在所有领域都优于人类 | 科幻概念 |
1.1.2 人工智能技术栈全景图
graph TD
A[人工智能] --> B[机器学习]
A --> C[深度学习]
A --> D[自然语言处理]
A --> E[计算机视觉]
A --> F[强化学习]
A --> G[知识图谱]
B --> B1[监督学习]
B --> B2[无监督学习]
B --> B3[半监督学习]
B --> B4[强化学习]
C --> C1[卷积神经网络]
C --> C2[循环神经网络]
C --> C3[Transformer]
C --> C4[生成对抗网络]
D --> D1[文本分类]
D --> D2[情感分析]
D --> D3[机器翻译]
D --> D4[问答系统]
E --> E1[图像分类]
E --> E2[目标检测]
E --> E3[图像分割]
E --> E4[人脸识别]
1.2 人工智能发展简史
- 1950年:图灵提出"图灵测试",奠定AI理论基础
- 1956年:达特茅斯会议,"人工智能"术语正式诞生
- 1980s:专家系统兴起,AI第一次商业化浪潮
- 1997年:IBM深蓝击败国际象棋世界冠军卡斯帕罗夫
- 2012年:AlexNet在ImageNet竞赛中取得突破性成绩,深度学习时代开启
- 2016年:AlphaGo击败围棋世界冠军李世石
- 2020年至今:大模型时代,GPT、BERT等预训练模型引领AI革命
1.3 现代AI产业格局
1.3.1 主要技术方向
| 领域 | 核心技术 | 应用场景 | 代表企业 |
|---|---|---|---|
| 计算机视觉 | CNN、YOLO、Transformer | 人脸识别、自动驾驶、医疗影像 | 商汤、旷视、百度 |
| 自然语言处理 | BERT、GPT、T5 | 智能客服、机器翻译、内容生成 | OpenAI、Google、阿里 |
| 语音技术 | RNN、WaveNet、Whisper | 语音识别、语音合成、智能音箱 | 科大讯飞、Apple、Amazon |
| 推荐系统 | 协同过滤、深度学习 | 电商推荐、内容分发、广告投放 | 字节、腾讯、Netflix |
1.3.2 AI人才需求分析
根据2025年招聘数据,AI相关岗位技能要求:
| 岗位 | 必备技能 | 加分技能 | 平均薪资 |
|---|---|---|---|
| 算法工程师 | Python、PyTorch/TensorFlow、数学基础 | 大模型、分布式训练 | ¥35,000/月 |
| 数据科学家 | Python、SQL、统计学、机器学习 | 深度学习、A/B测试 | ¥28,000/月 |
| AI产品经理 | 产品设计、AI技术理解、数据分析 | 技术背景、项目管理 | ¥25,000/月 |
| MLOps工程师 | Docker、Kubernetes、CI/CD、监控 | 模型部署、性能优化 | ¥32,000/月 |
二、人工智能数学基础
2.1 线性代数
2.1.1 向量与矩阵运算
import numpy as np
import matplotlib.pyplot as plt
# 向量运算
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
# 点积(内积)
dot_product = np.dot(v1, v2) # 1*4 + 2*5 + 3*6 = 32
# 叉积(外积)
cross_product = np.cross(v1[:2], v2[:2]) # 2D向量叉积
# 矩阵运算
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# 矩阵乘法
C = np.dot(A, B)
# 转置
A_T = A.T
# 逆矩阵
A_inv = np.linalg.inv(A)
# 特征值和特征向量
eigenvals, eigenvecs = np.linalg.eig(A)
print(f"点积: {dot_product}")
print(f"矩阵乘法:\n{C}")
print(f"特征值: {eigenvals}")
2.1.2 特征值分解与奇异值分解
特征值分解(EVD):
A=QΛQ−1A=QΛQ−1
其中 QQ 是特征向量矩阵,ΛΛ 是特征值对角矩阵。
奇异值分解(SVD):
A=UΣVTA=UΣVT
其中 UU 和 VV 是正交矩阵,ΣΣ 是奇异值对角矩阵。
应用:PCA降维、推荐系统、图像压缩
2.2 概率论与统计学
2.2.1 贝叶斯定理
P(A∣B)=P(B∣A)P(A)P(B)P(A∣B)=P(B)P(B∣A)P(A)
应用案例:垃圾邮件过滤
# 简单的贝叶斯垃圾邮件分类器
class NaiveBayesSpamFilter:
def __init__(self):
self.spam_word_probs = {}
self.ham_word_probs = {}
self.p_spam = 0.5 # 先验概率
def train(self, emails, labels):
"""训练贝叶斯分类器"""
spam_emails = [email for email, label in zip(emails, labels) if label == 1]
ham_emails = [email for email, label in zip(emails, labels) if label == 0]
# 计算先验概率
self.p_spam = len(spam_emails) / len(emails)
# 统计词频
spam_words = self._extract_words(spam_emails)
ham_words = self._extract_words(ham_emails)
total_spam_words = sum(spam_words.values())
total_ham_words = sum(ham_words.values())
# 计算条件概率(带拉普拉斯平滑)
vocab = set(spam_words.keys()) | set(ham_words.keys())
for word in vocab:
spam_count = spam_words.get(word, 0)
ham_count = ham_words.get(word, 0)
self.spam_word_probs[word] = (spam_count + 1) / (total_spam_words + len(vocab))
self.ham_word_probs[word] = (ham_count + 1) / (total_ham_words + len(vocab))
def predict(self, email):
"""预测邮件是否为垃圾邮件"""
words = self._tokenize(email)
# 计算后验概率的对数(避免下溢)
log_p_spam = np.log(self.p_spam)
log_p_ham = np.log(1 - self.p_spam)
for word in words:
if word in self.spam_word_probs:
log_p_spam += np.log(self.spam_word_probs[word])
log_p_ham += np.log(self.ham_word_probs[word])
return 1 if log_p_spam > log_p_ham else 0
def _extract_words(self, emails):
"""提取词频"""
word_count = {}
for email in emails:
words = self._tokenize(email)
for word in words:
word_count[word] = word_count.get(word, 0) + 1
return word_count
def _tokenize(self, text):
"""简单分词"""
return text.lower().split()
2.2.2 概率分布
| 分布类型 | 概率密度函数 | 应用场景 |
|---|---|---|
| 正态分布 | f(x)=1σ2πe−(x−μ)22σ2f(x)=σ2π1e−2σ2(x−μ)2 | 误差分析、特征标准化 |
| 伯努利分布 | P(X=1)=p,P(X=0)=1−pP(X=1)=p,P(X=0)=1−p | 二分类问题 |
| 泊松分布 | P(X=k)=λke−λk!P(X=k)=k!λke−λ | 事件计数(如网站访问量) |
| 指数分布 | f(x)=λe−λxf(x)=λe−λx | 等待时间、可靠性分析 |
2.3 微积分与优化
2.3.1 梯度下降法
梯度:函数在某点处变化最快的方向
∇f(x)=[∂f∂x1,∂f∂x2,…,∂f∂xn]∇f(x)=[∂x1∂f,∂x2∂f,…,∂xn∂f]
梯度下降更新规则:
θt+1=θt−α∇J(θt)θt+1=θt−α∇J(θt)
其中 αα 是学习率,J(θ)J(θ) 是损失函数。
# 从零实现梯度下降
def gradient_descent(X, y, learning_rate=0.01, epochs=1000):
"""
线性回归的梯度下降实现
X: 特征矩阵 (m, n)
y: 目标向量 (m,)
"""
m, n = X.shape
theta = np.random.randn(n) # 初始化参数
cost_history = []
for epoch in range(epochs):
# 前向传播
y_pred = X.dot(theta)
# 计算损失(均方误差)
cost = np.mean((y_pred - y) ** 2)
cost_history.append(cost)
# 计算梯度
gradient = (2/m) * X.T.dot(y_pred - y)
# 参数更新
theta -= learning_rate * gradient
if epoch % 100 == 0:
print(f"Epoch {epoch}, Cost: {cost:.4f}")
return theta, cost_history
# 示例使用
np.random.seed(42)
X = np.random.randn(100, 2)
y = 3 * X[:, 0] + 2 * X[:, 1] + np.random.randn(100) * 0.1
# 添加偏置项
X_b = np.c_[np.ones((100, 1)), X]
theta, costs = gradient_descent(X_b, y, learning_rate=0.1, epochs=1000)
print(f"学到的参数: {theta}")
2.3.2 优化算法演进
| 算法 | 特点 | 适用场景 |
|---|---|---|
| SGD | 简单、内存效率高 | 大数据集 |
| Momentum | 加入动量项,减少震荡 | 深度网络 |
| RMSprop | 自适应学习率 | 非平稳目标 |
| Adam | 结合Momentum和RMSprop | 通用首选 |
三、机器学习基础
3.1 机器学习基本概念
3.1.1 监督学习 vs 无监督学习
| 类型 | 输入 | 输出 | 目标 | 示例 |
|---|---|---|---|---|
| 监督学习 | 特征 + 标签 | 预测标签 | 最小化预测误差 | 房价预测、图像分类 |
| 无监督学习 | 仅特征 | 模式/结构 | 发现数据内在结构 | 聚类、降维 |
| 强化学习 | 状态 | 动作 | 最大化累积奖励 | 游戏AI、机器人控制 |
3.1.2 机器学习工作流程
flowchart TD
A[问题定义] --> B[数据收集]
B --> C[数据预处理]
C --> D[特征工程]
D --> E[模型选择]
E --> F[模型训练]
F --> G[模型评估]
G --> H{性能达标?}
H -->|否| I[模型调优]
I --> F
H -->|是| J[模型部署]
J --> K[监控维护]
3.2 数据预处理
3.2.1 数据清洗
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, LabelEncoder
# 创建示例数据
df = pd.DataFrame({
'age': [25, 30, np.nan, 35, 40],
'income': [50000, 60000, 55000, np.nan, 70000],
'category': ['A', 'B', 'A', 'C', np.nan],
'target': [0, 1, 0, 1, 1]
})
print("原始数据:")
print(df)
print(f"\n缺失值统计:\n{df.isnull().sum()}")
# 处理缺失值
df_cleaned = df.copy()
# 数值型变量:用中位数填充
df_cleaned['age'].fillna(df_cleaned['age'].median(), inplace=True)
df_cleaned['income'].fillna(df_cleaned['income'].median(), inplace=True)
# 分类型变量:用众数填充
df_cleaned['category'].fillna(df_cleaned['category'].mode()[0], inplace=True)
# 处理异常值(IQR方法)
def remove_outliers_iqr(df, column):
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]
# df_cleaned = remove_outliers_iqr(df_cleaned, 'income')
print("\n清洗后数据:")
print(df_cleaned)
3.2.2 特征工程
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.decomposition import PCA
# 特征编码
le = LabelEncoder()
df_cleaned['category_encoded'] = le.fit_transform(df_cleaned['category'])
# 特征缩放
scaler = StandardScaler()
numerical_features = ['age', 'income']
df_cleaned[numerical_features] = scaler.fit_transform(df_cleaned[numerical_features])
# 特征选择
X = df_cleaned[['age', 'income', 'category_encoded']]
y = df_cleaned['target']
selector = SelectKBest(score_func=f_classif, k=2)
X_selected = selector.fit_transform(X, y)
selected_features = X.columns[selector.get_support()]
print(f"选择的特征: {selected_features}")
# 特征降维
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
print(f"PCA解释方差比: {pca.explained_variance_ratio_}")
3.3 经典机器学习算法
3.3.1 线性回归
数学原理:
y=θ0+θ1x1+θ2x2+⋯+θnxn+ϵy=θ0+θ1x1+θ2x2+⋯+θnxn+ϵ
损失函数(均方误差):
J(θ)=12m∑i=1m(hθ(x(i))−y(i))2J(θ)=2m1i=1∑m(hθ(x(i))−y(i))2
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
# 生成示例数据
np.random.seed(42)
X = np.random.randn(100, 2)
y = 3 * X[:, 0] + 2 * X[:, 1] + np.random.randn(100) * 0.1
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 训练模型
lr = LinearRegression()
lr.fit(X_train, y_train)
# 预测
y_pred = lr.predict(X_test)
# 评估
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"系数: {lr.coef_}")
print(f"截距: {lr.intercept_:.4f}")
print(f"MSE: {mse:.4f}")
print(f"R²: {r2:.4f}")
3.3.2 逻辑回归
数学原理(Sigmoid函数):
hθ(x)=11+e−θTxhθ(x)=1+e−θTx1
损失函数(对数损失):
J(θ)=−1m∑i=1m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]J(θ)=−m1i=1∑m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score, classification_report
# 生成二分类数据
X, y = make_classification(n_samples=1000, n_features=4, n_redundant=0,
n_informative=4, random_state=42, n_clusters_per_class=1)
# 划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 训练逻辑回归
lr = LogisticRegression(random_state=42)
lr.fit(X_train, y_train)
# 预测
y_pred = lr.predict(X_test)
y_pred_proba = lr.predict_proba(X_test)
# 评估
accuracy = accuracy_score(y_test, y_pred)
print(f"准确率: {accuracy:.4f}")
print("\n分类报告:")
print(classification_report(y_test, y_pred))
3.3.3 决策树与随机森林
决策树原理:通过特征分割最大化信息增益
随机森林:集成多个决策树,通过Bagging和特征随机选择提高泛化能力
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
# 加载数据
iris = load_iris()
X, y = iris.data, iris.target
# 划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 决策树
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)
dt_accuracy = accuracy_score(y_test, dt_pred)
# 随机森林
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)
rf_accuracy = accuracy_score(y_test, rf_pred)
print(f"决策树准确率: {dt_accuracy:.4f}")
print(f"随机森林准确率: {rf_accuracy:.4f}")
# 特征重要性
feature_importance = rf.feature_importances_
feature_names = iris.feature_names
plt.figure(figsize=(10, 6))
plt.bar(feature_names, feature_importance)
plt.title('随机森林特征重要性')
plt.xlabel('特征')
plt.ylabel('重要性')
plt.show()
3.3.4 支持向量机
数学原理:寻找最大间隔超平面
核函数:将线性不可分问题映射到高维空间
from sklearn.svm import SVC
from sklearn.datasets import make_circles
from sklearn.preprocessing import StandardScaler
# 生成非线性可分数据
X, y = make_circles(n_samples=1000, noise=0.1, factor=0.2, random_state=42)
# 标准化
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# 划分数据集
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# 不同核函数的SVM
kernels = ['linear', 'poly', 'rbf', 'sigmoid']
results = {}
for kernel in kernels:
svm = SVC(kernel=kernel, random_state=42)
svm.fit(X_train, y_train)
y_pred = svm.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
results[kernel] = accuracy
print(f"{kernel}核准确率: {accuracy:.4f}")
# 可视化RBF核结果
svm_rbf = SVC(kernel='rbf', random_state=42)
svm_rbf.fit(X_train, y_train)
# 创建网格
xx, yy = np.meshgrid(np.linspace(-3, 3, 500), np.linspace(-3, 3, 500))
Z = svm_rbf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
plt.title('原始数据')
plt.subplot(1, 2, 2)
plt.contourf(xx, yy, Z, levels=50, cmap='RdYlBu', alpha=0.7)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
plt.title('SVM RBF核决策边界')
plt.show()
3.4 模型评估与验证
3.4.1 评估指标
| 任务类型 | 指标 | 公式 | 适用场景 |
|---|---|---|---|
| 回归 | MSE | 1m∑(yi−y^i)2m1∑(yi−y^i)2 | 通用回归评估 |
| R² | 1−SSresSStot1−SStotSSres | 模型解释力 | |
| 二分类 | Accuracy | TP+TNTP+TN+FP+FNTP+TN+FP+FNTP+TN | 平衡数据集 |
| Precision | TPTP+FPTP+FPTP | 关注假阳性 | |
| Recall | TPTP+FNTP+FNTP | 关注假阴性 | |
| F1-score | 2precision⋅recallprecision+recall2precision+recallprecision⋅recall | 平衡精确率和召回率 | |
| 多分类 | Macro-F1 | 各类F1的算术平均 | 各类同等重要 |
| Micro-F1 | 全局TP、FP、FN计算F1 | 数据不平衡 |
3.4.2 交叉验证
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
# 使用分层K折交叉验证
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
# 评估随机森林
rf = RandomForestClassifier(n_estimators=100, random_state=42)
cv_scores = cross_val_score(rf, X, y, cv=skf, scoring='accuracy')
print(f"交叉验证准确率: {cv_scores}")
print(f"平均准确率: {cv_scores.mean():.4f} ± {cv_scores.std():.4f}")
# 不同模型的交叉验证比较
models = {
'Logistic Regression': LogisticRegression(random_state=42),
'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
'SVM': SVC(kernel='rbf', random_state=42)
}
results = {}
for name, model in models.items():
scores = cross_val_score(model, X, y, cv=skf, scoring='accuracy')
results[name] = scores
print(f"{name}: {scores.mean():.4f} ± {scores.std():.4f}")
3.4.3 学习曲线与验证曲线
from sklearn.model_selection import learning_curve, validation_curve
# 学习曲线
def plot_learning_curve(estimator, X, y, title="Learning Curve"):
train_sizes, train_scores, val_scores = learning_curve(
estimator, X, y, cv=5, n_jobs=-1,
train_sizes=np.linspace(0.1, 1.0, 10), random_state=42
)
train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
val_mean = np.mean(val_scores, axis=1)
val_std = np.std(val_scores, axis=1)
plt.figure(figsize=(10, 6))
plt.plot(train_sizes, train_mean, 'o-', color='blue', label='Training Score')
plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, alpha=0.1, color='blue')
plt.plot(train_sizes, val_mean, 'o-', color='red', label='Validation Score')
plt.fill_between(train_sizes, val_mean - val_std, val_mean + val_std, alpha=0.1, color='red')
plt.xlabel('Training Set Size')
plt.ylabel('Accuracy')
plt.title(title)
plt.legend()
plt.grid(True)
plt.show()
# 绘制随机森林的学习曲线
plot_learning_curve(RandomForestClassifier(n_estimators=100, random_state=42), X, y)
四、深度学习基础
4.1 神经网络基础
4.1.1 感知机与多层感知机
感知机(单层神经网络):
y=f(wTx+b)y=f(wTx+b)
其中 ff 是激活函数(如阶跃函数)。
多层感知机(MLP):
h(1)=f(W(1)x+b(1))h(2)=f(W(2)h(1)+b(2))⋮y=f(W(L)h(L−1)+b(L))h(1)=f(W(1)x+b(1))h(2)=f(W(2)h(1)+b(2))⋮y=f(W(L)h(L−1)+b(L))
4.1.2 激活函数
| 激活函数 | 公式 | 优点 | 缺点 |
|---|---|---|---|
| Sigmoid | σ(x)=11+e−xσ(x)=1+e−x1 | 输出在(0,1),概率解释 | 梯度消失、输出非零中心 |
| Tanh | tanh(x)=ex−e−xex+e−xtanh(x)=ex+e−xex−e−x | 零中心、更强梯度 | 梯度消失 |
| ReLU | ReLU(x)=max(0,x)ReLU(x)=max(0,x) | 计算简单、缓解梯度消失 | 死亡神经元 |
| Leaky ReLU | LReLU(x)=max(0.01x,x)LReLU(x)=max(0.01x,x) | 解决死亡神经元 | 需要调参 |
import torch
import torch.nn as nn
import torch.optim as optim
# PyTorch实现MLP
class MLP(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(MLP, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, hidden_size)
self.fc3 = nn.Linear(hidden_size, output_size)
self.relu = nn.ReLU()
self.dropout = nn.Dropout(0.2)
def forward(self, x):
x = self.relu(self.fc1(x))
x = self.dropout(x)
x = self.relu(self.fc2(x))
x = self.dropout(x)
x = self.fc3(x)
return x
# 训练MLP
def train_mlp(X_train, y_train, X_test, y_test, epochs=100):
# 转换为PyTorch张量
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.LongTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.LongTensor(y_test)
# 创建模型
model = MLP(input_size=X_train.shape[1], hidden_size=64, output_size=len(np.unique(y_train)))
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 训练循环
train_losses = []
for epoch in range(epochs):
model.train()
optimizer.zero_grad()
outputs = model(X_train_tensor)
loss = criterion(outputs, y_train_tensor)
loss.backward()
optimizer.step()
train_losses.append(loss.item())
if epoch % 20 == 0:
model.eval()
with torch.no_grad():
test_outputs = model(X_test_tensor)
_, predicted = torch.max(test_outputs.data, 1)
accuracy = (predicted == y_test_tensor).sum().item() / len(y_test_tensor)
print(f"Epoch {epoch}, Loss: {loss.item():.4f}, Test Accuracy: {accuracy:.4f}")
return model, train_losses
# 使用鸢尾花数据集
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model, losses = train_mlp(X_train, y_train, X_test, y_test, epochs=200)
4.2 卷积神经网络
4.2.1 CNN基本原理
卷积层:提取局部特征
(I∗K)(i,j)=∑m∑nI(i+m,j+n)K(m,n)(I∗K)(i,j)=m∑n∑I(i+m,j+n)K(m,n)
池化层:降维和特征不变性
- 最大池化:保留最显著特征
- 平均池化:平滑特征
全连接层:分类决策
4.2.2 CNN架构演进
| 架构 | 创新点 | 年份 |
|---|---|---|
| LeNet-5 | 首个CNN架构 | 1998 |
| AlexNet | ReLU、Dropout、GPU训练 | 2012 |
| VGGNet | 小卷积核堆叠 | 2014 |
| GoogLeNet | Inception模块 | 2014 |
| ResNet | 残差连接 | 2015 |
| EfficientNet | 复合缩放 | 2019 |
import torchvision.models as models
import torchvision.transforms as transforms
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
# 数据预处理
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# 加载CIFAR-10数据集
train_dataset = CIFAR10(root='./data', train=True, download=True, transform=transform)
test_dataset = CIFAR10(root='./data', train=False, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
# 使用预训练的ResNet18(迁移学习)
model = models.resnet18(pretrained=True)
# 修改最后的全连接层以适应CIFAR-10的10个类别
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)
# 训练配置
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 训练循环(简化版)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
for epoch in range(10):
model.train()
running_loss = 0.0
correct = 0
total = 0
for i, (inputs, labels) in enumerate(train_loader):
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
if i % 100 == 99:
print(f'Epoch {epoch+1}, Batch {i+1}, Loss: {running_loss/100:.4f}, Accuracy: {100*correct/total:.2f}%')
running_loss = 0.0
correct = 0
total = 0
# 测试
model.eval()
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Test Accuracy: {100 * correct / total:.2f}%')
4.3 循环神经网络
4.3.1 RNN基本原理
基本RNN单元:
ht=tanh(Whhht−1+Wxhxt+bh)yt=Whyht+byht=tanh(Whhht−1+Wxhxt+bh)yt=Whyht+by
问题:梯度消失/爆炸,难以处理长序列
4.3.2 LSTM与GRU
LSTM(长短期记忆):
- 遗忘门:ft=σ(Wf⋅[ht−1,xt]+bf)ft=σ(Wf⋅[ht−1,xt]+bf)
- 输入门:$i_t = \sigma(W_i \cdot [h_{t-1

被折叠的 条评论
为什么被折叠?



