cp11_15_TradingStrategies(dataframe.plot.scatter)2

本文探讨了深度神经网络(DNNs)在预测金融市场方向中的效能,使用scikit-learn和TensorFlow实现算法交易策略。通过比较在样本内和样本外的表现,展示了DNNs的强大预测能力及潜在的过拟合风险。
部署运行你感兴趣的模型镜像

https://blog.youkuaiyun.com/Linli522362242/article/details/102314389

Deep Neural Networks

Deep neural networks (DNNs) try to emulate the functioning of the human brain. They are in general composed of an input layer (the features), an output layer (the labels), and a number of hidden layers. The presence of hidden layers is what makes a neural network deep. It allows it to learn more complex relationships and to perform better on a number of problem types. When applying DNNs one generally speaks of deep learning instead of machine learning. For an introduction to this field, refer to Géron (2017) or Gibson and Patterson (2017).

DNNs with scikit-learn

This section applies the MLPClassifier algorithm from scikit-learn, as introduced in “Deep neural networks”. First, it is trained and tested on the whole data set, using the digitized features. The algorithm achieves exceptional performance in-sample (see Figure 15-17), which illustrates the power of DNNs for this type of problem. It also hints at strong overfitting, since the performance indeed seems unrealistically good:

from sklearn.neural_network import MLPClassifier

model = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes= 2*[250], random_state=1)

%time model.fit(data[cols_bin], data['direction'])

data['pos_dnn_sk'] = model.predict(data[cols_bin])

data['strat_dnn_sk'] = data['pos_dnn_sk'] * data['returns']

data[['returns', 'strat_dnn_sk']].sum().apply(np.exp)

data[['returns', 'strat_dnn_sk']].cumsum().apply(np.exp).plot(figsize=(10,6),
                                                               title='Performance of EUR/USD and DNN-based trading strategy (scikit-learn, in-sample)'
                                                              )

To avoid overfitting of the DNN model, a randomized train-test split is applied next. The algorithm again outperforms the passive benchmark investment and achieves a positive absolute performance (Figure 15-18). However, the results seem more realistic now:

DNNs with TensorFlow

TensorFlow has become a popular package for deep learning. It is developed and supported by Google Inc. and applied there to a great variety of machine learning problems. Zedah and Ramsundar (2018) cover TensorFlow for deep learning in depth

As with scikit-learn, the application of the DNNClassifier algorithm from TensorFlow to derive an algorithmic trading strategy is straightforward given the background from “Deep neural networks”. The training and test data is the same as before. First, the training of the model. In-sample, the algorithm outperforms the passive benchmark investment and shows a considerable absolute return (see Figure 15-19), again hinting at overfitting

train, test = train_test_split(data, test_size=0.5, random_state=100)

train = train.copy().sort_index()

test = test.copy().sort_index()

#Increases the number of hidden layers and hidden units.
model = MLPClassifier(solver='lbfgs', alpha=1e-5, max_iter=500, hidden_layer_sizes=3*[500], random_state=1)

%time model.fit(train[cols_bin], train['direction'])

test['pos_dnn_sk'] = model.predict(test[cols_bin])

test['strat_dnn_sk'] = test['pos_dnn_sk'] * test['returns']

test[['returns', 'strat_dnn_sk']].sum().apply(np.exp)

test[['returns', 'strat_dnn_sk']].cumsum().apply(np.exp).plot(figsize=(10,6),
                                                              title='Performance of EUR/USD and DNN-based trading strategy (scikit-learn, randomized train-test split)'
                                                             )

DNNs with TensorFlow

TensorFlow has become a popular package for deep learning. It is developed and supported by Google Inc. and applied there to a great variety of machine learning problems. Zedah and Ramsundar (2018) cover TensorFlow for deep learning in depth

As with scikit-learn, the application of the DNNClassifier algorithm from TensorFlow to derive an algorithmic trading strategy is straightforward given the background from “Deep neural networks”. The training and test data is the same as before. First, the training of the model. In-sample, the algorithm outperforms the passive benchmark investment and shows a considerable absolute return (see Figure 15-19), again hinting at overfitting

import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)

fc =[tf.contrib.layers.real_valued_column('lags', dimension=lags)]

model = tf.contrib.learn.DNNClassifier(hidden_units=3*[500],
                                       n_classes=len(bins)+1,
                                       feature_columns=fc
                                      )

def input_fn():
    fc = {'lags': tf.constant(data[cols_bin].values)}
    la = tf.constant(data['direction'].apply(lambda x: 0 if x<0 else 1).values,
                     shape=[data['direction'].size,1]
                    )
    return fc, la

%time model.fit(input_fn=input_fn, steps=250)

model.evaluate(input_fn=input_fn, steps=1)

pred = np.array(list(model.predict(input_fn = input_fn)))
pred[:50]

data['pos_dnn_tf'] = np.where(pred>0,1,-1)

data['strat_dnn_tf'] = data['pos_dnn_tf'] * data['returns']

data[['returns', 'strat_dnn_tf']].sum().apply(np.exp)

data[['returns', 'strat_dnn_tf']].cumsum().apply(np.exp).plot(figsize=(10,6),
                                                             title='Performance of EUR/USD and DNN-based trading strategy (TensorFlow, in-sample)'
                                                             )

The following code again implements a randomized train-test split to get a more realistic view of the performance of the DNN-based algorithmic trading strategy. The performance is, as expected, worse out-of-sample (see Figure 15-20). In addition, given the specific parameterization the TensorFlow DNNClassifier underperforms the scikit-learn MLPClassifier algorithm by quite few percentage points:

model = tf.contrib.learn.DNNClassifier(hidden_units=3*[500],
                                       n_classes=len(bins)+1,
                                       feature_columns=fc
                                      )

data=train

%time model.fit(input_fn=input_fn, steps=2500)

data=test

model.evaluateate(input_fn=input_fn, steps=1)

pred = np.array(list(model.predict(input_fn=input_fn)))

test['pos_dnn_tf'] = np.where(pred>0,1,-1)

test['strat_dnn_tf'] = test['pos_dnn_tf']*test['returns']

test[['returns','strat_dnn_sk', 'strat_dnn_tf']].sum().apply(np.exp)

test[['returns','strat_dnn_sk','strat_dnn_tf']].cumsum().apply(np.exp).plot(figsize=(10,6),
                                                                           titile='Performance of EUR/USD and DNN-based trading strategy (TensorFlow, randomized train-test split)')

 

 

 

 

您可能感兴趣的与本文相关的镜像

TensorFlow-v2.15

TensorFlow-v2.15

TensorFlow

TensorFlow 是由Google Brain 团队开发的开源机器学习框架,广泛应用于深度学习研究和生产环境。 它提供了一个灵活的平台,用于构建和训练各种机器学习模型

导入必要的库 import pandas as pd import numpy as np import seaborn as sns from sklearn.model_selection import train_test_split, cross_val_score, learning_curve from sklearn.preprocessing import StandardScaler, LabelEncoder from sklearn.decomposition import PCA from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_curve, auc from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier, ExtraTreesClassifier from sklearn.svm import SVC from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.neural_network import MLPClassifier import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec from matplotlib.patches import Patch import warnings warnings.filterwarnings(‘ignore’) 设置中文字体 plt.rcParams[‘font.sans-serif’] = [‘SimHei’] # 用来正常显示中文标签 plt.rcParams[‘axes.unicode_minus’] = False # 用来正常显示负号 尝试导入可选库 try: from xgboost import XGBClassifier XGB_AVAILABLE = True except ImportError: print(“XGBoost 不可用,跳过 XGBoost 模型”) XGB_AVAILABLE = False try: from lightgbm import LGBMClassifier LGBM_AVAILABLE = True except ImportError: print(“LightGBM 不可用,跳过 LightGBM 模型”) LGBM_AVAILABLE = False try: import shap SHAP_AVAILABLE = True except ImportError: print(“SHAP 不可用,跳过 SHAP 解释”) SHAP_AVAILABLE = False 定义张继权教授的四因子理论函数 def apply_four_factor_theory(df): # 检查所需的列是否存在 required_cols = [‘PFBA’, ‘PFPeA’, ‘PFHxA’, ‘PFHpA’, ‘PFOA’, ‘PFNA’, ‘PFDA’, ‘PFBS’, ‘PFHxS’, ‘PFOS’] available_cols = [col for col in required_cols if col in df.columns] if len(available_cols) < 4: print(f"警告: 只有 {len(available_cols)} 个PFAS列可用,可能需要调整四因子计算") # 短期酸类因子 (PFBA, PFPeA, PFHxA) short_term_cols = [col for col in ['PFBA', 'PFPeA', 'PFHxA'] if col in df.columns] if short_term_cols: df['Short_term_acid_factor'] = df[short_term_cols].mean(axis=1, skipna=True) else: df['Short_term_acid_factor'] = 0 # 长期酸类因子 (PFHpA, PFOA, PFNA, PFDA) long_term_cols = [col for col in ['PFHpA', 'PFOA', 'PFNA', 'PFDA'] if col in df.columns] if long_term_cols: df['Long_term_acid_factor'] = df[long_term_cols].mean(axis=1, skipna=True) else: df['Long_term_acid_factor'] = 0 # 磺酸类因子 (PFBS, PFHxS, PFOS) sulfonate_cols = [col for col in ['PFBS', 'PFHxS', 'PFOS'] if col in df.columns] if sulfonate_cols: df['Sulfonate_factor'] = df[sulfonate_cols].mean(axis=1, skipna=True) else: df['Sulfonate_factor'] = 0 # 暴露因子 (总PFAS浓度) all_pfas_cols = [col for col in required_cols if col in df.columns] if all_pfas_cols: df['Exposure_factor'] = df[all_pfas_cols].sum(axis=1, skipna=True) else: df['Exposure_factor'] = 0 return df 自定义表格打印函数 def print_table(data, headers=None, title=None): if title: print(f"\n{title}“) print(”=" * 60) if headers: # 打印表头 header_line = " | ".join(f"{h:>15}" for h in headers) print(header_line) print("-" * len(header_line)) # 打印数据行 for row in data: if isinstance(row, (list, tuple)): row_line = " | ".join(f"{str(item):>15}" for item in row) else: # 处理DataFrame行 row_line = " | ".join(f"{str(row[col]):>15}" for col in headers) print(row_line) 尝试多种编码方式读取文件 def read_csv_with_encodings(file_path): # 尝试的编码列表(中文环境中常见的编码) encodings = [‘utf-8’, ‘gbk’, ‘gb2312’, ‘gb18030’, ‘latin1’, ‘cp936’] for encoding in encodings: try: print(f"尝试使用 {encoding} 编码读取文件...") df = pd.read_csv(file_path, encoding=encoding) # 删除完全为空的行和列 df = df.dropna(how='all', axis=0) df = df.dropna(how='all', axis=1) print(f"成功使用 {encoding} 编码读取文件") return df except UnicodeDecodeError: continue except Exception as e: print(f"使用 {encoding} 编码时出错: {e}") continue # 如果所有编码都失败,尝试使用错误处理 try: print("尝试使用错误处理方式读取文件...") df = pd.read_csv(file_path, encoding='utf-8', errors='ignore') df = df.dropna(how='all', axis=0) df = df.dropna(how='all', axis=1) print("使用错误处理方式成功读取文件") return df except Exception as e: print(f"所有读取尝试都失败: {e}") return None 加载CSV文件并跳过完全为空的行列 file_path = r’E:\pycharm\meta\整合数据.csv’ # 使用原始字符串表示法 try: # 使用多种编码方式尝试读取文件 df = read_csv_with_encodings(file_path) if df is None: print("无法读取文件,请检查文件路径和格式") exit() print(f"数据形状: {df.shape}") print(f"列名: {df.columns.tolist()}") except Exception as e: print(f"读取文件时出错: {e}") exit() 数据预处理 检查PFAS相关列是否存在,如果不存在则尝试找到类似的列 expected_features = [‘PFBA’, ‘PFPeA’, ‘PFHxA’, ‘PFHpA’, ‘PFOA’, ‘PFNA’, ‘PFDA’, ‘PFBS’, ‘PFHxS’, ‘PFOS’] available_features = [] for feature in expected_features: if feature in df.columns: available_features.append(feature) else: # 尝试找到包含该字符串的列 matching_cols = [col for col in df.columns if feature.lower() in col.lower()] if matching_cols: available_features.extend(matching_cols) print(f"使用 ‘{matching_cols[0]}’ 替代 ‘{feature}’") 如果没有找到任何PFAS特征,使用所有数值列 if not available_features: numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist() available_features = numeric_cols print(f"未找到PFAS特征,使用所有数值列: {available_features}") print(f"可用的PFAS特征: {available_features}") 检查目标列 target_column = ‘城市’ if target_column not in df.columns: # 尝试找到可能的目标列 possible_targets = [col for col in df.columns if any(word in col for word in [‘城市’, ‘地区’, ‘区域’, ‘地点’, ‘city’, ‘region’])] if possible_targets: target_column = possible_targets[0] print(f"使用 ‘{target_column}’ 作为目标变量") else: # 如果没有找到,使用第一列非数值列作为目标 non_numeric_cols = df.select_dtypes(exclude=[np.number]).columns if len(non_numeric_cols) > 0: target_column = non_numeric_cols[0] print(f"使用 ‘{target_column}’ 作为目标变量") else: # 如果没有非数值列,使用最后一列作为目标 target_column = df.columns[-1] print(f"使用最后一列 ‘{target_column}’ 作为目标变量") 处理缺失值(用中位数填充) for feature in available_features: if feature in df.columns: df[feature] = pd.to_numeric(df[feature], errors=‘coerce’) df[feature] = df[feature].fillna(df[feature].median()) 应用张继权教授的四因子理论 df = apply_four_factor_theory(df) 添加四因子到特征中 features = available_features + [‘Short_term_acid_factor’, ‘Long_term_acid_factor’, ‘Sulfonate_factor’, ‘Exposure_factor’] print(f"最终使用的特征: {features}") 编码目标变量(多分类) le = LabelEncoder() df[target_column] = le.fit_transform(df[target_column].fillna(‘Unknown’)) class_names = le.classes_ 分割数据集 X = df[features] y = df[target_column] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) print(f"训练集大小: {X_train.shape}“) print(f"测试集大小: {X_test.shape}”) print(f"目标变量类别数: {len(np.unique(y))}") 标准化特征 scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) PCA降维(保留95%方差) pca = PCA(n_components=0.95) X_train_pca = pca.fit_transform(X_train_scaled) X_test_pca = pca.transform(X_test_scaled) print(f"PCA降维后特征数: {X_train_pca.shape[1]}") ========== 新增:相关性分析 ========== print(“\n进行相关性分析…”) 计算特征之间的相关性矩阵 correlation_matrix = X.corr() 绘制相关性热图 plt.figure(figsize=(12, 10)) sns.heatmap(correlation_matrix, annot=True, cmap=‘coolwarm’, center=0, fmt=‘.2f’) plt.title(‘特征相关性热图’) plt.tight_layout() plt.savefig(‘correlation_heatmap.png’, dpi=300, bbox_inches=‘tight’) plt.show() 找出高度相关的特征对 high_corr_pairs = [] for i in range(len(correlation_matrix.columns)): for j in range(i + 1, len(correlation_matrix.columns)): if abs(correlation_matrix.iloc[i, j]) > 0.8: # 阈值设为0.8 high_corr_pairs.append(( correlation_matrix.columns[i], correlation_matrix.columns[j], correlation_matrix.iloc[i, j] )) if high_corr_pairs: print(“高度相关的特征对:”) for pair in high_corr_pairs: print(f" {pair[0]} 和 {pair[1]}: {pair[2]:.3f}") else: print(“没有发现高度相关的特征对(相关系数>0.8)) ========== 新增:Meta分析 - 数据分布可视化 ========== print(“\n进行Meta分析…”) 1. 目标变量分布 plt.figure(figsize=(10, 6)) df[target_column].value_counts().plot(kind=‘bar’) plt.title(‘目标变量分布’) plt.xlabel(‘类别’) plt.ylabel(‘样本数量’) plt.xticks(rotation=45) plt.tight_layout() plt.savefig(‘target_distribution.png’, dpi=300, bbox_inches=‘tight’) plt.show() 2. 四因子分布 four_factor_cols = [‘Short_term_acid_factor’, ‘Long_term_acid_factor’, ‘Sulfonate_factor’, ‘Exposure_factor’] if all(col in df.columns for col in four_factor_cols): fig, axes = plt.subplots(2, 2, figsize=(12, 10)) axes = axes.ravel() for i, col in enumerate(four_factor_cols): axes[i].hist(df[col], bins=30, alpha=0.7, color='skyblue', edgecolor='black') axes[i].set_title(f'{col} 分布') axes[i].set_xlabel(col) axes[i].set_ylabel('频率') plt.tight_layout() plt.savefig('four_factor_distribution.png', dpi=300, bbox_inches='tight') plt.show() 3. 特征箱线图 plt.figure(figsize=(15, 10)) X.boxplot() plt.title(‘特征箱线图’) plt.xticks(rotation=45) plt.tight_layout() plt.savefig(‘feature_boxplot.png’, dpi=300, bbox_inches=‘tight’) plt.show() 4. PCA可视化 plt.figure(figsize=(10, 8)) if X_train_pca.shape[1] >= 2: scatter = plt.scatter(X_train_pca[:, 0], X_train_pca[:, 1], c=y_train, cmap=‘viridis’, alpha=0.7) plt.colorbar(scatter) plt.xlabel(‘第一主成分’) plt.ylabel(‘第二主成分’) plt.title(‘PCA降维可视化’) plt.tight_layout() plt.savefig(‘pca_visualization.png’, dpi=300, bbox_inches=‘tight’) plt.show() 5. 累计方差解释率 plt.figure(figsize=(10, 6)) pca_full = PCA().fit(X_train_scaled) plt.plot(np.cumsum(pca_full.explained_variance_ratio_)) plt.xlabel(‘主成分数量’) plt.ylabel(‘累计方差解释率’) plt.title(‘PCA累计方差解释率’) plt.grid(True) plt.tight_layout() plt.savefig(‘pca_explained_variance.png’, dpi=300, bbox_inches=‘tight’) plt.show() ========== 模型训练和评估 ========== 定义机器学习模型 models = { ‘LogisticRegression’: LogisticRegression(max_iter=1000, random_state=42), ‘DecisionTree’: DecisionTreeClassifier(random_state=42), ‘RandomForest’: RandomForestClassifier(random_state=42), ‘SVM’: SVC(random_state=42, probability=True), ‘KNN’: KNeighborsClassifier(), ‘NaiveBayes’: GaussianNB(), ‘GradientBoosting’: GradientBoostingClassifier(random_state=42), ‘MLP’: MLPClassifier(max_iter=1000, random_state=42), ‘AdaBoost’: AdaBoostClassifier(random_state=42), ‘ExtraTrees’: ExtraTreesClassifier(random_state=42) } 添加可选模型 if XGB_AVAILABLE: models[‘XGBoost’] = XGBClassifier(use_label_encoder=False, eval_metric=‘mlogloss’, random_state=42) if LGBM_AVAILABLE: models[‘LightGBM’] = LGBMClassifier(random_state=42) print(f"将训练 {len(models)} 个模型") 训练和评估模型 results = [] cv_results = [] # 用于存储交叉验证结果 model_objects = {} # 存储训练好的模型对象 print(“开始训练模型…”) for name, model in models.items(): print(f"训练 {name}…") try: # 训练模型 model.fit(X_train_pca, y_train) model_objects[name] = model # 预测和评估 y_pred = model.predict(X_test_pca) y_pred_proba = model.predict_proba(X_test_pca) if hasattr(model, "predict_proba") else None acc = accuracy_score(y_test, y_pred) report = classification_report(y_test, y_pred, output_dict=True, zero_division=0) # 交叉验证 cv_scores = cross_val_score(model, X_train_pca, y_train, cv=5, scoring='accuracy') cv_mean = cv_scores.mean() cv_std = cv_scores.std() results.append([ name, acc, report['weighted avg']['precision'], report['weighted avg']['recall'], report['weighted avg']['f1-score'], cv_mean, cv_std ]) cv_results.append({ 'model': name, 'scores': cv_scores }) except Exception as e: print(f"训练模型 {name} 时出错: {e}") results.append([name, 0, 0, 0, 0, 0, 0]) 生成结果表格 headers = [‘Model’, ‘Accuracy’, ‘Precision’, ‘Recall’, ‘F1-Score’, ‘CV Mean’, ‘CV Std’] results_df = pd.DataFrame(results, columns=headers) results_df = results_df.sort_values(‘Accuracy’, ascending=False) print(“\n模型性能排名:”) print_table(results_df.values.tolist(), headers=headers, title=“模型性能比较”) ========== 新增:模型性能可视化 ========== print(“\n生成模型性能可视化…”) 1. 模型准确率比较 plt.figure(figsize=(12, 8)) models_names = results_df[‘Model’] accuracies = results_df[‘Accuracy’] cv_means = results_df[‘CV Mean’] cv_stds = results_df[‘CV Std’] x = np.arange(len(models_names)) width = 0.35 plt.bar(x - width / 2, accuracies, width, label=‘测试集准确率’, alpha=0.7) plt.bar(x + width / 2, cv_means, width, yerr=cv_stds, label=‘交叉验证准确率’, alpha=0.7, capsize=5) plt.xlabel(‘模型’) plt.ylabel(‘准确率’) plt.title(‘模型性能比较’) plt.xticks(x, models_names, rotation=45) plt.legend() plt.tight_layout() plt.savefig(‘model_performance_comparison.png’, dpi=300, bbox_inches=‘tight’) plt.show() 2. 交叉验证箱线图 cv_df = pd.DataFrame({item[‘model’]: item[‘scores’] for item in cv_results}) plt.figure(figsize=(12, 8)) cv_df.boxplot() plt.title(‘模型交叉验证准确率分布’) plt.xticks(rotation=45) plt.ylabel(‘准确率’) plt.tight_layout() plt.savefig(‘cv_boxplot.png’, dpi=300, bbox_inches=‘tight’) plt.show() 3. 最佳模型详细分析 best_model_name = results_df.iloc[0][‘Model’] best_model = model_objects[best_model_name] print(f"\n对最佳模型 {best_model_name} 进行详细分析…") 混淆矩阵 y_pred_best = best_model.predict(X_test_pca) cm = confusion_matrix(y_test, y_pred_best) plt.figure(figsize=(10, 8)) sns.heatmap(cm, annot=True, fmt=‘d’, cmap=‘Blues’, xticklabels=class_names, yticklabels=class_names) plt.title(f’{best_model_name} 混淆矩阵’) plt.xlabel(‘预测标签’) plt.ylabel(‘真实标签’) plt.tight_layout() plt.savefig(f’confusion_matrix_{best_model_name}.png’, dpi=300, bbox_inches=‘tight’) plt.show() 学习曲线 def plot_learning_curve(estimator, title, X, y, cv=None, n_jobs=None, train_sizes=np.linspace(.1, 1.0, 5)): plt.figure(figsize=(10, 6)) train_sizes, train_scores, test_scores = learning_curve( estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes) train_scores_mean = np.mean(train_scores, axis=1) train_scores_std = np.std(train_scores, axis=1) test_scores_mean = np.mean(test_scores, axis=1) test_scores_std = np.std(test_scores, axis=1) plt.grid() plt.fill_between(train_sizes, train_scores_mean - train_scores_std, train_scores_mean + train_scores_std, alpha=0.1, color="r") plt.fill_between(train_sizes, test_scores_mean - test_scores_std, test_scores_mean + test_scores_std, alpha=0.1, color="g") plt.plot(train_sizes, train_scores_mean, 'o-', color="r", label="训练得分") plt.plot(train_sizes, test_scores_mean, 'o-', color="g", label="交叉验证得分") plt.xlabel("训练样本数") plt.ylabel("得分") plt.title(title) plt.legend(loc="best") plt.tight_layout() plt.savefig(f'learning_curve_{best_model_name}.png', dpi=300, bbox_inches='tight') plt.show() plot_learning_curve(best_model, f’{best_model_name} 学习曲线’, X_train_pca, y_train, cv=5) 特征重要性(如果模型支持) if hasattr(best_model, ‘feature_importances_’): importances = best_model.feature_importances_ indices = np.argsort(importances)[::-1] plt.figure(figsize=(10, 8)) plt.title(f"{best_model_name} 特征重要性") plt.bar(range(min(20, len(importances))), importances[indices[:20]]) plt.xticks(range(min(20, len(importances))), [f'PC{i + 1}' for i in indices[:20]], rotation=45) plt.tight_layout() plt.savefig(f'feature_importance_{best_model_name}.png', dpi=300, bbox_inches='tight') plt.show() ========== 新增:SHAP特征分析组合图 ========== if SHAP_AVAILABLE: print(“\n生成SHAP特征分析组合图…”) # 只为最佳模型生成SHAP组合图 if best_model_name in ['RandomForest', 'DecisionTree', 'GradientBoosting', 'XGBoost', 'LightGBM']: try: # 创建SHAP解释器 explainer = shap.TreeExplainer(best_model) shap_values = explainer.shap_values(X_test_pca) # 对于多分类问题,选择第一个类别的SHAP值 if isinstance(shap_values, list): shap_values_used = shap_values[0] # 使用第一个类别的SHAP值 else: shap_values_used = shap_values # 创建组合图:全局重要性(左)与个体影响(右) fig = plt.figure(figsize=(20, 10)) gs = gridspec.GridSpec(1, 2, width_ratios=[1, 1]) # 左图:全局特征重要性 ax1 = plt.subplot(gs[0]) shap.summary_plot(shap_values_used, X_test_pca, plot_type="bar", feature_names=[f'PC{i + 1}' for i in range(X_test_pca.shape[1])], show=False, max_display=15) ax1.set_title(f'{best_model_name} - SHAP全局特征重要性', fontsize=16, fontweight='bold') # 右图:个体样本SHAP值 ax2 = plt.subplot(gs[1]) # 选择一个有代表性的样本(SHAP值绝对值最大的样本) sample_idx = np.argmax(np.sum(np.abs(shap_values_used), axis=1)) # 绘制瀑布图显示个体样本的SHAP值 shap.waterfall_plot( explainer.expected_value[0] if isinstance(explainer.expected_value, list) else explainer.expected_value, shap_values_used[sample_idx], feature_names=[f'PC{i + 1}' for i in range(X_test_pca.shape[1])], show=False, max_display=15) ax2.set_title(f'{best_model_name} - 个体样本SHAP值分析\n(样本索引: {sample_idx})', fontsize=16, fontweight='bold') plt.tight_layout() plt.savefig(f'shap_combined_{best_model_name}.png', dpi=300, bbox_inches='tight') plt.show() # 创建另一个组合图:SHAP摘要图与依赖图 fig2 = plt.figure(figsize=(20, 10)) gs2 = gridspec.GridSpec(1, 2, width_ratios=[1, 1]) # 左图:SHAP摘要图(显示特征值与SHAP值的关系) ax3 = plt.subplot(gs2[0]) shap.summary_plot(shap_values_used, X_test_pca, feature_names=[f'PC{i + 1}' for i in range(X_test_pca.shape[1])], show=False, max_display=15) ax3.set_title(f'{best_model_name} - SHAP摘要图', fontsize=16, fontweight='bold') # 右图:SHAP依赖图(最重要的特征) ax4 = plt.subplot(gs2[1]) # 找到最重要的特征 if hasattr(best_model, 'feature_importances_'): feature_importances = best_model.feature_importances_ most_important_feature = np.argmax(feature_importances) else: # 如果没有feature_importances_属性,使用SHAP值计算重要性 feature_importances = np.mean(np.abs(shap_values_used), axis=0) most_important_feature = np.argmax(feature_importances) shap.dependence_plot(most_important_feature, shap_values_used, X_test_pca, feature_names=[f'PC{i + 1}' for i in range(X_test_pca.shape[1])], show=False, ax=ax4) ax4.set_title(f'{best_model_name} - SHAP依赖图\n(最重要的特征: PC{most_important_feature + 1})', fontsize=16, fontweight='bold') plt.tight_layout() plt.savefig(f'shap_detailed_{best_model_name}.png', dpi=300, bbox_inches='tight') plt.show() print(f"已保存 SHAP组合图 for {best_model_name}") except Exception as e: print(f"生成SHAP组合图失败 for {best_model_name}: {e}") else: print(f"最佳模型 {best_model_name} 不支持TreeExplainer,跳过SHAP组合图") else: print(“\nSHAP不可用,跳过SHAP组合图”) ========== 生成四因子表格 ========== four_factor_cols = [‘Short_term_acid_factor’, ‘Long_term_acid_factor’, ‘Sulfonate_factor’, ‘Exposure_factor’] if all(col in df.columns for col in four_factor_cols): four_factor_table = df[[target_column] + four_factor_cols].head(10) # 解码目标变量以显示原始标签 four_factor_table[target_column] = le.inverse_transform(four_factor_table[target_column]) print_table(four_factor_table.values.tolist(), headers=[target_column] + four_factor_cols, title=“四因子理论表格 (前10行)) ========== 生成PCA组件表格 ========== pca_components = pd.DataFrame(pca.components_, columns=features) print(f"\nPCA主成分表格 (前5个主成分)😊 print_table(pca_components.head().values.tolist(), headers=[‘PC’] + features, title=“PCA主成分权重”) ========== 保存重要结果 ========== results_df.to_csv(‘model_results.csv’, index=False) print(“\n模型结果已保存到 ‘model_results.csv’”) 显示最佳模型 best_model = results_df.iloc[0] print(f"\n最佳模型: {best_model[‘Model’]} (准确率: {best_model[‘Accuracy’]:.4f})") print(“\n所有图表已生成完成!”)完善改进代码
最新发布
11-12
import sys import pandas as pd import numpy as np from PyQt5 import QtWidgets from PyQt5.QtWidgets import QMainWindow, QFileDialog, QGridLayout, QVBoxLayout, QHBoxLayout, QLabel, QComboBox, QPushButton, QLineEdit, QGroupBox, QStatusBar from PyQt5.QtCore import Qt from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas import matplotlib.pyplot as plt from scipy.signal import savgol_filter # 设置中文字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 使用黑体显示中文 plt.rcParams['axes.unicode_minus'] = False # 正常显示负号 class MyApp(QMainWindow): def __init__(self): super().__init__() self.setWindowTitle("数据分析工具") self.setGeometry(100, 100, 1200, 800) # 初始化组件 self.initUI() # 存储数据 self.data = None self.selected_point = None self.valid_data = None # 存储有效数据 self.smoothed_data = None # 存储平滑后的数据 self.avg_rate = None # 平均速率 self.all_points_avg_rates = {} # 存储所有数据点的平均速率 self.all_points_tangent_angles = {} # 存储所有数据点的切线角数据 def initUI(self): # 创建主窗口的中央部件 centralWidget = QtWidgets.QWidget(self) self.setCentralWidget(centralWidget) # 创建网格布局 layout = QGridLayout() # 创建文件加载部分 fileGroupBox = QGroupBox("加载数据") fileLayout = QHBoxLayout() self.loadButton = QPushButton("加载CSV文件", self) self.loadButton.clicked.connect(self.loadCSV) fileLayout.addWidget(self.loadButton) fileGroupBox.setLayout(fileLayout) layout.addWidget(fileGroupBox, 0, 0) # 创建选择分析点的下拉框 analysisGroupBox = QGroupBox("选择数据点") analysisLayout = QHBoxLayout() self.comboBox = QComboBox(self) analysisLayout.addWidget(self.comboBox) analysisGroupBox.setLayout(analysisLayout) layout.addWidget(analysisGroupBox, 1, 0) # 创建平滑处理按钮 processGroupBox = QGroupBox("数据处理") processLayout = QVBoxLayout() self.smoothButton = QPushButton("平滑处理并绘图", self) self.smoothButton.clicked.connect(self.smoothData) processLayout.addWidget(self.smoothButton) self.rateButton = QPushButton("计算速率并绘图", self) self.rateButton.clicked.connect(self.calculateRate) processLayout.addWidget(self.rateButton) self.exportRateButton = QPushButton("导出速率数据", self) self.exportRateButton.clicked.connect(self.exportRate) processLayout.addWidget(self.exportRateButton) processGroupBox.setLayout(processLayout) layout.addWidget(processGroupBox, 2, 0) # 创建时间输入框和平均速率计算按钮 rateGroupBox = QGroupBox("速率计算") rateLayout = QVBoxLayout() timeInputLayout = QHBoxLayout() self.startTimeInput = QLineEdit(self) self.startTimeInput.setPlaceholderText("起始时间(YYYY-MM-DD HH:MM)") self.endTimeInput = QLineEdit(self) self.endTimeInput.setPlaceholderText("终止时间(YYYY-MM-DD HH:MM)") timeInputLayout.addWidget(self.startTimeInput) timeInputLayout.addWidget(self.endTimeInput) self.averageRateButton = QPushButton("计算平均速率", self) self.averageRateButton.clicked.connect(self.calculateAverageRate) rateLayout.addLayout(timeInputLayout) rateLayout.addWidget(self.averageRateButton) # 新增:计算所有点平均速率按钮 self.allPointsRateButton = QPushButton("计算所有点平均速率", self) self.allPointsRateButton.clicked.connect(self.calculateAllPointsAverageRate) rateLayout.addWidget(self.allPointsRateButton) # 新增:导出所有点平均速率按钮 self.exportAllRatesButton = QPushButton("导出所有点平均速率", self) self.exportAllRatesButton.clicked.connect(self.exportAllPointsAverageRate) rateLayout.addWidget(self.exportAllRatesButton) rateGroupBox.setLayout(rateLayout) layout.addWidget(rateGroupBox, 3, 0) # 创建S-t转换T-t按钮 st2ttGroupBox = QGroupBox("S-t转T-t曲线") st2ttLayout = QVBoxLayout() self.st2ttButton = QPushButton("S-t转T-t曲线并绘图", self) self.st2ttButton.clicked.connect(self.st2tt) st2ttLayout.addWidget(self.st2ttButton) st2ttGroupBox.setLayout(st2ttLayout) layout.addWidget(st2ttGroupBox, 4, 0) # 创建计算切线角的按钮 angleGroupBox = QGroupBox("切线角计算") angleLayout = QVBoxLayout() self.tangentAngleButton = QPushButton("计算切线角并标注", self) self.tangentAngleButton.clicked.connect(self.calculateTangentAngle) angleLayout.addWidget(self.tangentAngleButton) self.exportTangentAngleButton = QPushButton("导出切线角数据", self) self.exportTangentAngleButton.clicked.connect(self.exportTangentAngle) angleLayout.addWidget(self.exportTangentAngleButton) # 新增:计算所有点切线角按钮 self.allPointsTangentAngleButton = QPushButton("计算所有点切线角", self) self.allPointsTangentAngleButton.clicked.connect(self.calculateAllPointsTangentAngle) angleLayout.addWidget(self.allPointsTangentAngleButton) # 新增:导出所有点切线角按钮 self.exportAllTangentAnglesButton = QPushButton("导出所有点切线角", self) self.exportAllTangentAnglesButton.clicked.connect(self.exportAllPointsTangentAngle) angleLayout.addWidget(self.exportAllTangentAnglesButton) angleGroupBox.setLayout(angleLayout) layout.addWidget(angleGroupBox, 5, 0) # 状态栏 self.statusBar = QStatusBar(self) self.setStatusBar(self.statusBar) # matplotlib 图表部分 self.canvas = FigureCanvas(plt.Figure(figsize=(15, 6))) # 设置画布大小 layout.addWidget(self.canvas, 0, 1, 6, 1) centralWidget.setLayout(layout) def validate_data_loaded(self): """验证数据是否已加载""" if self.data is None: self.statusBar.showMessage("请先加载数据", 5000) return False return True def validate_smoothed_data(self): """验证是否已完成平滑处理""" if not hasattr(self, 'smoothed_data') or not hasattr(self, 'valid_data'): self.statusBar.showMessage("请先进行平滑处理", 5000) return False return True def loadCSV(self): try: fileName, _ = QFileDialog.getOpenFileName(self, "打开CSV文件", "", "CSV Files (*.csv)") if fileName: # 修正编码读取逻辑 try: self.data = pd.read_csv(fileName, encoding='utf-8') except UnicodeDecodeError: try: self.data = pd.read_csv(fileName, encoding='GBK') except UnicodeDecodeError: self.data = pd.read_csv(fileName, encoding='ISO-8859-1') if '扫描时间' not in self.data.columns: self.statusBar.showMessage("数据中缺少'扫描时间'列", 5000) return self.comboBox.clear() # 只添加数值列,跳过时间列 numeric_columns = [col for col in self.data.columns if col != '扫描时间'] self.comboBox.addItems(numeric_columns) # 重置处理状态 self.valid_data = None self.smoothed_data = None self.avg_rate = None self.all_points_avg_rates = {} self.all_points_tangent_angles = {} self.statusBar.showMessage("数据加载成功", 5000) except Exception as e: self.statusBar.showMessage(f"加载数据时出错: {e}", 5000) def smoothData(self): try: if not self.validate_data_loaded(): return selected_point = self.comboBox.currentText() if not selected_point: self.statusBar.showMessage("请选择数据点", 5000) return data = self.data[['扫描时间', selected_point]].dropna() # 将 '扫描时间' 列转换为 datetime 格式 data['扫描时间'] = pd.to_datetime(data['扫描时间'], errors='coerce') if data['扫描时间'].isnull().any(): self.statusBar.showMessage("时间格式错误,请检查数据", 5000) return displacement = pd.to_numeric(data[selected_point], errors='coerce') # 删除无效数据行 valid_mask = ~data['扫描时间'].isnull() & ~displacement.isnull() valid_data = pd.DataFrame({ 'time': data['扫描时间'][valid_mask], 'displacement': displacement[valid_mask] }).reset_index(drop=True) if len(valid_data) < 10: self.statusBar.showMessage("有效数据太少,无法进行平滑处理", 5000) return # 保存验证数据供其他方法使用 self.valid_data = valid_data # 动态设置Savitzky-Golay滤波器窗口大小 window_length = min(51, len(valid_data) // 10 * 2 + 1) if window_length < 5: window_length = 5 if window_length > len(valid_data): window_length = len(valid_data) - 1 if len(valid_data) % 2 == 0 else len(valid_data) # 确保窗口长度为奇数 if window_length % 2 == 0: window_length -= 1 if window_length < 3: window_length = 3 displacement_smooth = savgol_filter( valid_data['displacement'], window_length=window_length, polyorder=min(3, window_length - 1) ) # 确保位移曲线是递增的 displacement_increasing = np.maximum.accumulate(displacement_smooth) self.smoothed_data = displacement_increasing # 清除之前的绘图 self.canvas.figure.clf() # 绘制平滑处理前后的图像 ax = self.canvas.figure.subplots() ax.plot(valid_data['time'], valid_data['displacement'], label='原始累计位移', color='blue', alpha=0.7) ax.plot(valid_data['time'], displacement_increasing, label='平滑后递增累计位移', color='red', linestyle='--', linewidth=2) ax.set_title(f"平滑处理: {selected_point}") ax.set_xlabel('时间') ax.set_ylabel('累计位移(毫米)') ax.legend() ax.grid(True) ax.tick_params(axis='x', rotation=45) ax.figure.tight_layout() self.canvas.draw() self.statusBar.showMessage("平滑处理完成", 5000) except Exception as e: self.statusBar.showMessage(f"平滑处理时出错: {e}", 5000) def calculateRate(self): try: if self.data is not None and hasattr(self, 'smoothed_data') and hasattr(self, 'valid_data'): selected_point = self.comboBox.currentText() # 确保已经有平滑后的数据 if not hasattr(self, 'smoothed_data'): self.statusBar.showMessage("请先进行平滑处理", 5000) return # 使用平滑后的数据计算速率 valid_data = self.valid_data # 将时间转换为小时 (以小时为单位) time_in_hours = (valid_data['time'] - valid_data['time'].iloc[0]).dt.total_seconds() / 3600.0 # 计算速率(位移的一阶导数,单位是mm/h) velocity_increasing = np.gradient(self.smoothed_data, time_in_hours) # 清除之前的绘图 self.canvas.figure.clf() # 绘制速率随时间变化的折线图 ax = self.canvas.figure.subplots() ax.plot(valid_data['time'], velocity_increasing, label='速率 (mm/h)', color='green') # 设置标题和标签 ax.set_title(f"速率随时间变化 (单位:mm/h): {selected_point}") ax.set_xlabel('时间') ax.set_ylabel('速率 (mm/h)') # 设置网格和旋转 x 轴标签 ax.legend() ax.grid(True) ax.tick_params(axis='x', rotation=45) # 调整布局,防止重叠 ax.figure.tight_layout() # 刷新画布,显示绘图结果 self.canvas.draw() self.statusBar.showMessage("速率计算完成", 5000) else: self.statusBar.showMessage("请先加载数据或进行平滑处理", 5000) except Exception as e: self.statusBar.showMessage(f"计算速率时出错: {e}", 5000) def exportRate(self): try: if self.data is not None and hasattr(self, 'smoothed_data') and hasattr(self, 'valid_data'): selected_point = self.comboBox.currentText() # 确保已经有平滑后的数据 if not hasattr(self, 'smoothed_data'): self.statusBar.showMessage("请先进行平滑处理", 5000) return # 使用平滑后的数据计算速率 valid_data = self.valid_data # 将时间转换为小时 (以小时为单位) time_in_hours = (valid_data['time'] - valid_data['time'].iloc[0]).dt.total_seconds() / 3600.0 # 计算速率(位移的一阶导数,单位是mm/h) velocity_increasing = np.gradient(self.smoothed_data, time_in_hours) # 将速率数据添加到DataFrame中 valid_data['Rate (mm/h)'] = velocity_increasing # 导出为CSV文件 fileName, _ = QFileDialog.getSaveFileName(self, "保存速率数据", "", "CSV Files (*.csv)") if fileName: # 导出时间和速率列到 CSV 文件 valid_data[['time', 'Rate (mm/h)']].to_csv(fileName, index=False) self.statusBar.showMessage("速率数据导出成功", 5000) else: self.statusBar.showMessage("请先加载数据或进行平滑处理", 5000) except Exception as e: self.statusBar.showMessage(f"导出速率数据时出错: {e}", 5000) def calculateAverageRate(self): try: if self.data is not None and hasattr(self, 'smoothed_data') and hasattr(self, 'valid_data'): selected_point = self.comboBox.currentText() # 从输入框获取开始和结束时间 start_time_str = self.startTimeInput.text() end_time_str = self.endTimeInput.text() # 将输入的时间转换为 datetime 格式 start_time = pd.to_datetime(start_time_str) end_time = pd.to_datetime(end_time_str) # 使用平滑处理时保存的valid_data valid_data = self.valid_data # 筛选时间段内的数据 mask = (valid_data['time'] >= start_time) & (valid_data['time'] <= end_time) selected_data = valid_data.loc[mask] # 获取对应的平滑数据 selected_smoothed_indices = mask.values selected_smoothed_data = self.smoothed_data[selected_smoothed_indices] # 确保筛选后的数据不为空 if selected_data.empty: self.statusBar.showMessage("在该时间范围内没有数据", 5000) return # 计算总位移变化和总时间 total_displacement = selected_smoothed_data[-1] - selected_smoothed_data[0] total_time_hours = (selected_data['time'].iloc[-1] - selected_data['time'].iloc[0]).total_seconds() / 3600.0 if total_time_hours == 0: self.statusBar.showMessage("时间范围太短,无法计算平均速率", 5000) return # 计算平均速率 = 总位移变化 / 总时间 avg_rate = total_displacement / total_time_hours # 保存平均速率供其他函数使用 self.avg_rate = avg_rate # 在状态栏中显示平均速率 self.statusBar.showMessage(f"平均速率为: {avg_rate:.6f} mm/h (基于平滑后数据)", 5000) else: self.statusBar.showMessage("请先加载数据并进行平滑处理", 5000) except Exception as e: self.statusBar.showMessage(f"计算平均速率时出错: {e}", 5000) # 新增功能1: 计算所有数据点的平均速率 def calculateAllPointsAverageRate(self): try: if self.data is None: self.statusBar.showMessage("请先加载数据", 5000) return # 从输入框获取开始和结束时间 start_time_str = self.startTimeInput.text() end_time_str = self.endTimeInput.text() if not start_time_str or not end_time_str: self.statusBar.showMessage("请先输入起始时间和终止时间", 5000) return # 将输入的时间转换为 datetime 格式 start_time = pd.to_datetime(start_time_str) end_time = pd.to_datetime(end_time_str) # 获取所有数据点列名(排除时间列) data_points = [col for col in self.data.columns if col != '扫描时间'] # 清空之前的计算结果 self.all_points_avg_rates = {} # 遍历所有数据点 for point in data_points: try: # 获取当前数据点的数据 point_data = self.data[['扫描时间', point]].dropna() # 将 '扫描时间' 列转换为 datetime 格式 point_data['扫描时间'] = pd.to_datetime(point_data['扫描时间'], errors='coerce') point_data = point_data[~point_data['扫描时间'].isnull()] displacement = pd.to_numeric(point_data[point], errors='coerce') # 删除无效数据行 valid_mask = ~point_data['扫描时间'].isnull() & ~displacement.isnull() valid_data = pd.DataFrame({ 'time': point_data['扫描时间'][valid_mask], 'displacement': displacement[valid_mask] }).reset_index(drop=True) if len(valid_data) < 10: continue # 动态设置窗口长度并进行平滑 window_length = min(51, len(valid_data) // 10 * 2 + 1) if window_length < 5: window_length = 5 if window_length > len(valid_data): window_length = len(valid_data) - 1 if len(valid_data) % 2 == 0 else len(valid_data) if window_length % 2 == 0: window_length -= 1 if window_length < 3: window_length = 3 displacement_smooth = savgol_filter( valid_data['displacement'], window_length=window_length, polyorder=min(3, window_length - 1) ) # 确保位移曲线是递增的 displacement_increasing = np.maximum.accumulate(displacement_smooth) # 筛选时间段内的数据 mask = (valid_data['time'] >= start_time) & (valid_data['time'] <= end_time) selected_data = valid_data.loc[mask] selected_smoothed_data = displacement_increasing[mask.values] # 确保筛选后的数据不为空 if selected_data.empty: continue # 计算总位移变化和总时间 total_displacement = selected_smoothed_data[-1] - selected_smoothed_data[0] total_time_hours = (selected_data['time'].iloc[-1] - selected_data['time'].iloc[0]).total_seconds() / 3600.0 if total_time_hours == 0: continue # 计算平均速率 = 总位移变化 / 总时间 avg_rate = total_displacement / total_time_hours # 存储该点的平均速率 self.all_points_avg_rates[point] = avg_rate except Exception as e: print(f"计算数据点 {point} 的平均速率时出错: {e}") continue # 显示计算结果 if self.all_points_avg_rates: self.statusBar.showMessage(f"成功计算 {len(self.all_points_avg_rates)} 个数据点的平均速率", 5000) # 在状态栏显示前几个点的平均速率作为示例 sample_points = list(self.all_points_avg_rates.keys())[:3] sample_rates = [self.all_points_avg_rates[p] for p in sample_points] sample_text = ", ".join([f"{p}: {r:.4f}" for p, r in zip(sample_points, sample_rates)]) if len(self.all_points_avg_rates) > 3: sample_text += " ..." self.statusBar.showMessage(f"成功计算 {len(self.all_points_avg_rates)} 个数据点的平均速率。示例: {sample_text}", 5000) else: self.statusBar.showMessage("未能计算任何数据点的平均速率", 5000) except Exception as e: self.statusBar.showMessage(f"计算所有点平均速率时出错: {e}", 5000) # 新增功能: 导出所有点的平均速率 def exportAllPointsAverageRate(self): try: if not self.all_points_avg_rates: self.statusBar.showMessage("没有可导出的平均速率数据", 5000) return # 创建DataFrame rates_df = pd.DataFrame({ '数据点': list(self.all_points_avg_rates.keys()), '平均速率(mm/h)': list(self.all_points_avg_rates.values()) }) # 按数据点名称排序 rates_df = rates_df.sort_values('数据点') # 导出为CSV文件 fileName, _ = QFileDialog.getSaveFileName(self, "保存所有点平均速率数据", "", "CSV Files (*.csv)") if fileName: rates_df.to_csv(fileName, index=False) self.statusBar.showMessage(f"成功导出 {len(rates_df)} 个数据点的平均速率", 5000) except Exception as e: self.statusBar.showMessage(f"导出所有点平均速率时出错: {e}", 5000) def st2tt(self): try: if self.data is not None and hasattr(self, 'smoothed_data') and hasattr(self, 'valid_data'): selected_point = self.comboBox.currentText() valid_data = self.valid_data # 使用平滑处理后的数据 (self.smoothed_data) cumulative_deformation = self.smoothed_data # 检查是否已经计算了平均速率 if not hasattr(self, 'avg_rate'): self.statusBar.showMessage("请先计算平均速率", 5000) return # 使用已计算的平均速率 avg_rate = self.avg_rate # T = 累积变形量 / 匀速阶段的平均速率 T = cumulative_deformation / avg_rate # 清除之前的绘图 self.canvas.figure.clf() # 绘制T-t曲线 ax = self.canvas.figure.subplots() ax.plot(valid_data['time'], T, label='T-t曲线', color='green') # 设置标题和标签 ax.set_title(f"T-t曲线: {selected_point}") ax.set_xlabel('时间') ax.set_ylabel('T (累积变形量 / 匀速速率)') # 设置图例和网格 ax.legend() ax.grid(True) ax.tick_params(axis='x', rotation=45) # 调整布局 ax.figure.tight_layout() # 刷新画布 self.canvas.draw() self.statusBar.showMessage("T-t曲线绘制完成", 5000) else: self.statusBar.showMessage("请先加载数据或进行平滑处理", 5000) except Exception as e: self.statusBar.showMessage(f"S-t转T-t曲线时出错: {e}", 5000) def calculateTangentAngle(self): try: if self.data is not None and hasattr(self, 'smoothed_data') and hasattr(self, 'valid_data'): selected_point = self.comboBox.currentText() valid_data = self.valid_data # 使用平滑处理后的数据 (self.smoothed_data) cumulative_deformation = self.smoothed_data # 检查时间列和累计变形列长度是否一致 if len(cumulative_deformation) != len(valid_data['time']): self.statusBar.showMessage("时间数据和平滑后的数据长度不一致", 5000) return # 检查是否已经计算了平均速率 if not hasattr(self, 'avg_rate'): self.statusBar.showMessage("请先计算平均速率", 5000) return # 使用已计算的平均速率 avg_rate = self.avg_rate # T = 累积变形量 / 匀速阶段的平均速率 T = cumulative_deformation / avg_rate # 计算T的变化率 (ΔT / Δt) time_diff = (valid_data['time'] - valid_data['time'].shift(1)).dt.total_seconds() / 3600 # 使用T的差分而不是T本身 T_diff = np.diff(T) # 确保 time_diff 和 T_diff 长度一致 if len(T_diff) != len(time_diff) - 1: # time_diff第一个是NaN self.statusBar.showMessage("时间差与T差分长度不一致", 5000) return # 去掉time_diff的第一个NaN值 time_diff_valid = time_diff.iloc[1:] # 计算切线角 rate_of_change = np.degrees(np.arctan(T_diff / time_diff_valid)) # 找到最大切线角 max_angle_idx = np.argmax(rate_of_change) max_angle_time = valid_data['time'].iloc[max_angle_idx + 1] # +1 因为 T_diff 长度比 T 少1 max_angle_value = rate_of_change[max_angle_idx] # 找到45度切线角的时间点 idx_45 = np.where(np.isclose(rate_of_change, 45, atol=1))[0] if len(idx_45) > 0: time_45 = valid_data['time'].iloc[idx_45[0] + 1] T_45 = T[idx_45[0] + 1] # 找到80度切线角的时间点 idx_80 = np.where(np.isclose(rate_of_change, 80, atol=1))[0] if len(idx_80) > 0: time_80 = valid_data['time'].iloc[idx_80[0] + 1] T_80 = T[idx_80[0] + 1] # 清除之前的绘图 self.canvas.figure.clf() # 绘制T-t曲线 ax = self.canvas.figure.subplots() ax.plot(valid_data['time'], T, label='T-t曲线', color='green') # 标注最大切线角 ax.scatter([max_angle_time], [T[max_angle_idx + 1]], color='red', label=f"最大切线角({max_angle_value:.2f}°)") ax.text(max_angle_time, T[max_angle_idx + 1], f"{max_angle_value:.2f}°", color='red', fontsize=15) # 标注45度切线角的位置 if len(idx_45) > 0: ax.scatter([time_45], [T_45], color='blue', label="45°切线角") ax.text(time_45, T_45, "45°", color='blue', fontsize=15) # 标注80度切线角的位置 if len(idx_80) > 0: ax.scatter([time_80], [T_80], color='orange', label="80°切线角") ax.text(time_80, T_80, "80°", color='orange', fontsize=15) # 设置标题和标签 ax.set_title(f"T-t曲线: {selected_point}") ax.set_xlabel('时间') ax.set_ylabel('T (累积变形量 / 匀速速率)') # 设置图例和网格 ax.legend() ax.grid(True) ax.tick_params(axis='x', rotation=45) # 调整布局 ax.figure.tight_layout() # 刷新画布 self.canvas.draw() self.statusBar.showMessage(f"最大切线角时间: {max_angle_time}, 值: {max_angle_value:.2f}°", 5000) else: self.statusBar.showMessage("请先加载数据或进行平滑处理", 5000) except Exception as e: self.statusBar.showMessage(f"计算切线角时出错: {e}", 5000) # 新增功能2: 计算所有数据点的切线角 def calculateAllPointsTangentAngle(self): try: if not self.all_points_avg_rates: self.statusBar.showMessage("请先计算所有点的平均速率", 5000) return # 清空之前的计算结果 self.all_points_tangent_angles = {} # 遍历所有有平均速率的数据点 for point, avg_rate in self.all_points_avg_rates.items(): try: # 获取当前数据点的数据 point_data = self.data[['扫描时间', point]].dropna() # 将 '扫描时间' 列转换为 datetime 格式 point_data['扫描时间'] = pd.to_datetime(point_data['扫描时间'], errors='coerce') point_data = point_data[~point_data['扫描时间'].isnull()] displacement = pd.to_numeric(point_data[point], errors='coerce') # 删除无效数据行 valid_mask = ~point_data['扫描时间'].isnull() & ~displacement.isnull() valid_data = pd.DataFrame({ 'time': point_data['扫描时间'][valid_mask], 'displacement': displacement[valid_mask] }).reset_index(drop=True) if len(valid_data) < 10: continue # 动态设置窗口长度并进行平滑 window_length = min(51, len(valid_data) // 10 * 2 + 1) if window_length < 5: window_length = 5 if window_length > len(valid_data): window_length = len(valid_data) - 1 if len(valid_data) % 2 == 0 else len(valid_data) if window_length % 2 == 0: window_length -= 1 if window_length < 3: window_length = 3 displacement_smooth = savgol_filter( valid_data['displacement'], window_length=window_length, polyorder=min(3, window_length - 1) ) # 确保位移曲线是递增的 displacement_increasing = np.maximum.accumulate(displacement_smooth) # T = 累积变形量 / 匀速阶段的平均速率 T = displacement_increasing / avg_rate # 计算T的变化率 (ΔT / Δt) time_diff = (valid_data['time'] - valid_data['time'].shift(1)).dt.total_seconds() / 3600 time_diff = time_diff.iloc[1:] # 去掉第一个NaN值 # 使用T的差分 T_diff = np.diff(T) # 确保 time_diff 和 T_diff 长度一致 if len(T_diff) != len(time_diff): continue # 计算切线角 rate_of_change = np.degrees(np.arctan(T_diff / time_diff)) # 存储该点的切线角数据 self.all_points_tangent_angles[point] = { 'time': valid_data['time'].iloc[1:].reset_index(drop=True), 'tangent_angle': rate_of_change } except Exception as e: print(f"计算数据点 {point} 的切线角时出错: {e}") continue # 显示计算结果 if self.all_points_tangent_angles: self.statusBar.showMessage(f"成功计算 {len(self.all_points_tangent_angles)} 个数据点的切线角", 5000) else: self.statusBar.showMessage("未能计算任何数据点的切线角", 5000) except Exception as e: self.statusBar.showMessage(f"计算所有点切线角时出错: {e}", 5000) # 修改后的功能: 导出所有点的切线角数据 - 每个点位占用一列 def exportAllPointsTangentAngle(self): try: if not self.all_points_tangent_angles: self.statusBar.showMessage("没有可导出的切线角数据", 5000) return # 导出为CSV文件 fileName, _ = QFileDialog.getSaveFileName(self, "保存所有点切线角数据", "", "CSV Files (*.csv)") if fileName: # 创建一个新的DataFrame,时间列为第一列 time_column = None # 找到最长的时间序列作为基准 for point, data_dict in self.all_points_tangent_angles.items(): if time_column is None or len(data_dict['time']) > len(time_column): time_column = data_dict['time'] # 创建新的DataFrame,时间列为第一列 export_df = pd.DataFrame({'时间': time_column}) # 将每个数据点的切线角数据作为新列添加到DataFrame中 for point, data_dict in self.all_points_tangent_angles.items(): # 创建与基准时间列长度相同的数组,用NaN填充缺失值 tangent_angle_column = np.full(len(time_column), np.nan) # 找到当前数据点时间与基准时间的对应关系 for i, t in enumerate(time_column): # 在数据点的时间序列中查找匹配的时间 match_idx = data_dict['time'][data_dict['time'] == t].index if len(match_idx) > 0: tangent_angle_column[i] = data_dict['tangent_angle'].iloc[match_idx[0]] # 添加该数据点的切线角列 export_df[f'{point}切线角()'] = tangent_angle_column # 导出数据 export_df.to_csv(fileName, index=False) self.statusBar.showMessage(f"成功导出 {len(self.all_points_tangent_angles)} 个数据点的切线角数据", 5000) except Exception as e: self.statusBar.showMessage(f"导出所有点切线角数据时出错: {e}", 5000) def exportTangentAngle(self): try: if self.data is not None and hasattr(self, 'smoothed_data') and hasattr(self, 'valid_data'): selected_point = self.comboBox.currentText() valid_data = self.valid_data # 使用平滑处理后的数据 (self.smoothed_data) cumulative_deformation = self.smoothed_data # 检查时间列和累计变形列长度是否一致 if len(cumulative_deformation) != len(valid_data['time']): self.statusBar.showMessage("时间数据和平滑后的数据长度不一致", 5000) return # 检查是否已经计算了平均速率 if not hasattr(self, 'avg_rate'): self.statusBar.showMessage("请先计算平均速率", 5000) return # 使用已计算的平均速率 avg_rate = self.avg_rate # T = 累积变形量 / 匀速阶段的平均速率 T = cumulative_deformation / avg_rate # 计算T的变化率 (ΔT / Δt) time_diff = (valid_data['time'] - valid_data['time'].shift(1)).dt.total_seconds() / 3600 time_diff = time_diff.iloc[1:] # 去掉第一个NaN值 # 使用T的差分 T_diff = np.diff(T) # 确保 time_diff 和 T_diff 长度一致 if len(T_diff) != len(time_diff): self.statusBar.showMessage("时间差与T差分长度不一致", 5000) return rate_of_change = np.degrees(np.arctan(T_diff / time_diff)) # 找到最大切线角 max_angle_idx = np.argmax(rate_of_change) max_angle_time = valid_data['time'].iloc[max_angle_idx + 1] # +1 因为 T_diff 长度比 T 少1 max_angle_value = rate_of_change[max_angle_idx] # 找到45度切线角的时间点 idx_45 = np.where(np.isclose(rate_of_change, 45, atol=1))[0] if len(idx_45) > 0: time_45 = valid_data['time'].iloc[idx_45[0] + 1] # 找到80度切线角的时间点 idx_80 = np.where(np.isclose(rate_of_change, 80, atol=1))[0] if len(idx_80) > 0: time_80 = valid_data['time'].iloc[idx_80[0] + 1] # 创建切线角数据表 angle_data = pd.DataFrame({ '扫描时间': valid_data['time'].iloc[1:].reset_index(drop=True), # 时间少1行 'Tangent Angle (degrees)': rate_of_change }) # 添加45°和80°切线角位置的信息 if len(idx_45) > 0: angle_data['45° 切线角时间'] = np.where(angle_data['扫描时间'] == time_45, '是', '') if len(idx_80) > 0: angle_data['80° 切线角时间'] = np.where(angle_data['扫描时间'] == time_80, '是', '') angle_data['最大切线角时间'] = np.where(angle_data['扫描时间'] == max_angle_time, '是', '') # 导出为CSV文件 fileName, _ = QFileDialog.getSaveFileName(self, "保存切线角数据", "", "CSV Files (*.csv)") if fileName: angle_data.to_csv(fileName, index=False) self.statusBar.showMessage("切线角数据导出成功", 5000) else: self.statusBar.showMessage("请先加载数据或进行平滑处理", 5000) except Exception as e: self.statusBar.showMessage(f"导出切线角数据时出错: {e}", 5000) if __name__ == "__main__": app = QtWidgets.QApplication(sys.argv) window = MyApp() window.show() sys.exit(app.exec_())
11-10
# 导入必要的库 import pandas as pd import numpy as np import seaborn as sns from sklearn.model_selection import train_test_split, cross_val_score, learning_curve from sklearn.preprocessing import StandardScaler, LabelEncoder from sklearn.decomposition import PCA from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_curve, auc from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier, \ ExtraTreesClassifier from sklearn.svm import SVC from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.neural_network import MLPClassifier import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec from matplotlib.patches import Patch import warnings warnings.filterwarnings('ignore') # 设置中文字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签 plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号 # 尝试导入可选库 try: from xgboost import XGBClassifier XGB_AVAILABLE = True except ImportError: print("XGBoost 不可用,跳过 XGBoost 模型") XGB_AVAILABLE = False try: from lightgbm import LGBMClassifier LGBM_AVAILABLE = True except ImportError: print("LightGBM 不可用,跳过 LightGBM 模型") LGBM_AVAILABLE = False try: import shap SHAP_AVAILABLE = True except ImportError: print("SHAP 不可用,跳过 SHAP 解释") SHAP_AVAILABLE = False # 定义张继权教授的四因子理论函数 def apply_four_factor_theory(df): # 检查所需的列是否存在 required_cols = ['PFBA', 'PFPeA', 'PFHxA', 'PFHpA', 'PFOA', 'PFNA', 'PFDA', 'PFBS', 'PFHxS', 'PFOS'] available_cols = [col for col in required_cols if col in df.columns] if len(available_cols) < 4: print(f"警告: 只有 {len(available_cols)} 个PFAS列可用,可能需要调整四因子计算") # 短期酸类因子 (PFBA, PFPeA, PFHxA) short_term_cols = [col for col in ['PFBA', 'PFPeA', 'PFHxA'] if col in df.columns] if short_term_cols: df['Short_term_acid_factor'] = df[short_term_cols].mean(axis=1, skipna=True) else: df['Short_term_acid_factor'] = 0 # 长期酸类因子 (PFHpA, PFOA, PFNA, PFDA) long_term_cols = [col for col in ['PFHpA', 'PFOA', 'PFNA', 'PFDA'] if col in df.columns] if long_term_cols: df['Long_term_acid_factor'] = df[long_term_cols].mean(axis=1, skipna=True) else: df['Long_term_acid_factor'] = 0 # 磺酸类因子 (PFBS, PFHxS, PFOS) sulfonate_cols = [col for col in ['PFBS', 'PFHxS', 'PFOS'] if col in df.columns] if sulfonate_cols: df['Sulfonate_factor'] = df[sulfonate_cols].mean(axis=1, skipna=True) else: df['Sulfonate_factor'] = 0 # 暴露因子 (总PFAS浓度) all_pfas_cols = [col for col in required_cols if col in df.columns] if all_pfas_cols: df['Exposure_factor'] = df[all_pfas_cols].sum(axis=1, skipna=True) else: df['Exposure_factor'] = 0 return df # 自定义表格打印函数 def print_table(data, headers=None, title=None): if title: print(f"\n{title}") print("=" * 60) if headers: # 打印表头 header_line = " | ".join(f"{h:>15}" for h in headers) print(header_line) print("-" * len(header_line)) # 打印数据行 for row in data: if isinstance(row, (list, tuple)): row_line = " | ".join(f"{str(item):>15}" for item in row) else: # 处理DataFrame行 row_line = " | ".join(f"{str(row[col]):>15}" for col in headers) print(row_line) # 尝试多种编码方式读取文件 def read_csv_with_encodings(file_path): # 尝试的编码列表(中文环境中常见的编码) encodings = ['utf-8', 'gbk', 'gb2312', 'gb18030', 'latin1', 'cp936'] for encoding in encodings: try: print(f"尝试使用 {encoding} 编码读取文件...") df = pd.read_csv(file_path, encoding=encoding) # 删除完全为空的行和列 df = df.dropna(how='all', axis=0) df = df.dropna(how='all', axis=1) print(f"成功使用 {encoding} 编码读取文件") return df except UnicodeDecodeError: continue except Exception as e: print(f"使用 {encoding} 编码时出错: {e}") continue # 如果所有编码都失败,尝试使用错误处理 try: print("尝试使用错误处理方式读取文件...") df = pd.read_csv(file_path, encoding='utf-8', errors='ignore') df = df.dropna(how='all', axis=0) df = df.dropna(how='all', axis=1) print("使用错误处理方式成功读取文件") return df except Exception as e: print(f"所有读取尝试都失败: {e}") return None # 加载CSV文件并跳过完全为空的行列 file_path = r'E:\pycharm\meta\整合数据.csv' # 使用原始字符串表示法 try: # 使用多种编码方式尝试读取文件 df = read_csv_with_encodings(file_path) if df is None: print("无法读取文件,请检查文件路径和格式") exit() print(f"数据形状: {df.shape}") print(f"列名: {df.columns.tolist()}") except Exception as e: print(f"读取文件时出错: {e}") exit() # 数据预处理 # 检查PFAS相关列是否存在,如果不存在则尝试找到类似的列 expected_features = ['PFBA', 'PFPeA', 'PFHxA', 'PFHpA', 'PFOA', 'PFNA', 'PFDA', 'PFBS', 'PFHxS', 'PFOS'] available_features = [] for feature in expected_features: if feature in df.columns: available_features.append(feature) else: # 尝试找到包含该字符串的列 matching_cols = [col for col in df.columns if feature.lower() in col.lower()] if matching_cols: available_features.extend(matching_cols) print(f"使用 '{matching_cols[0]}' 替代 '{feature}'") # 如果没有找到任何PFAS特征,使用所有数值列 if not available_features: numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist() available_features = numeric_cols print(f"未找到PFAS特征,使用所有数值列: {available_features}") print(f"可用的PFAS特征: {available_features}") # 检查目标列 target_column = '城市' if target_column not in df.columns: # 尝试找到可能的目标列 possible_targets = [col for col in df.columns if any(word in col for word in ['城市', '地区', '区域', '地点', 'city', 'region'])] if possible_targets: target_column = possible_targets[0] print(f"使用 '{target_column}' 作为目标变量") else: # 如果没有找到,使用第一列非数值列作为目标 non_numeric_cols = df.select_dtypes(exclude=[np.number]).columns if len(non_numeric_cols) > 0: target_column = non_numeric_cols[0] print(f"使用 '{target_column}' 作为目标变量") else: # 如果没有非数值列,使用最后一列作为目标 target_column = df.columns[-1] print(f"使用最后一列 '{target_column}' 作为目标变量") # 处理缺失值(用中位数填充) for feature in available_features: if feature in df.columns: df[feature] = pd.to_numeric(df[feature], errors='coerce') df[feature] = df[feature].fillna(df[feature].median()) # 应用张继权教授的四因子理论 df = apply_four_factor_theory(df) # 添加四因子到特征中 features = available_features + ['Short_term_acid_factor', 'Long_term_acid_factor', 'Sulfonate_factor', 'Exposure_factor'] print(f"最终使用的特征: {features}") # 编码目标变量(多分类) le = LabelEncoder() df[target_column] = le.fit_transform(df[target_column].fillna('Unknown')) class_names = le.classes_ # 分割数据集 X = df[features] y = df[target_column] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) print(f"训练集大小: {X_train.shape}") print(f"测试集大小: {X_test.shape}") print(f"目标变量类别数: {len(np.unique(y))}") # 标准化特征 scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # PCA降维(保留95%方差) pca = PCA(n_components=0.95) X_train_pca = pca.fit_transform(X_train_scaled) X_test_pca = pca.transform(X_test_scaled) print(f"PCA降维后特征数: {X_train_pca.shape[1]}") # ========== 新增:相关性分析 ========== print("\n进行相关性分析...") # 计算特征之间的相关性矩阵 correlation_matrix = X.corr() # 绘制相关性热图 plt.figure(figsize=(12, 10)) sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, fmt='.2f') plt.title('特征相关性热图') plt.tight_layout() plt.savefig('correlation_heatmap.png', dpi=300, bbox_inches='tight') plt.show() # 找出高度相关的特征对 high_corr_pairs = [] for i in range(len(correlation_matrix.columns)): for j in range(i + 1, len(correlation_matrix.columns)): if abs(correlation_matrix.iloc[i, j]) > 0.8: # 阈值设为0.8 high_corr_pairs.append(( correlation_matrix.columns[i], correlation_matrix.columns[j], correlation_matrix.iloc[i, j] )) if high_corr_pairs: print("高度相关的特征对:") for pair in high_corr_pairs: print(f" {pair[0]} 和 {pair[1]}: {pair[2]:.3f}") else: print("没有发现高度相关的特征对(相关系数>0.8)") # ========== 新增:Meta分析 - 数据分布可视化 ========== print("\n进行Meta分析...") # 1. 目标变量分布 plt.figure(figsize=(10, 6)) df[target_column].value_counts().plot(kind='bar') plt.title('目标变量分布') plt.xlabel('类别') plt.ylabel('样本数量') plt.xticks(rotation=45) plt.tight_layout() plt.savefig('target_distribution.png', dpi=300, bbox_inches='tight') plt.show() # 2. 四因子分布 four_factor_cols = ['Short_term_acid_factor', 'Long_term_acid_factor', 'Sulfonate_factor', 'Exposure_factor'] if all(col in df.columns for col in four_factor_cols): fig, axes = plt.subplots(2, 2, figsize=(12, 10)) axes = axes.ravel() for i, col in enumerate(four_factor_cols): axes[i].hist(df[col], bins=30, alpha=0.7, color='skyblue', edgecolor='black') axes[i].set_title(f'{col} 分布') axes[i].set_xlabel(col) axes[i].set_ylabel('频率') plt.tight_layout() plt.savefig('four_factor_distribution.png', dpi=300, bbox_inches='tight') plt.show() # 3. 特征箱线图 plt.figure(figsize=(15, 10)) X.boxplot() plt.title('特征箱线图') plt.xticks(rotation=45) plt.tight_layout() plt.savefig('feature_boxplot.png', dpi=300, bbox_inches='tight') plt.show() # 4. PCA可视化 plt.figure(figsize=(10, 8)) if X_train_pca.shape[1] >= 2: scatter = plt.scatter(X_train_pca[:, 0], X_train_pca[:, 1], c=y_train, cmap='viridis', alpha=0.7) plt.colorbar(scatter) plt.xlabel('第一主成分') plt.ylabel('第二主成分') plt.title('PCA降维可视化') plt.tight_layout() plt.savefig('pca_visualization.png', dpi=300, bbox_inches='tight') plt.show() # 5. 累计方差解释率 plt.figure(figsize=(10, 6)) pca_full = PCA().fit(X_train_scaled) plt.plot(np.cumsum(pca_full.explained_variance_ratio_)) plt.xlabel('主成分数量') plt.ylabel('累计方差解释率') plt.title('PCA累计方差解释率') plt.grid(True) plt.tight_layout() plt.savefig('pca_explained_variance.png', dpi=300, bbox_inches='tight') plt.show() # ========== 模型训练和评估 ========== # 定义机器学习模型 models = { 'LogisticRegression': LogisticRegression(max_iter=1000, random_state=42), 'DecisionTree': DecisionTreeClassifier(random_state=42), 'RandomForest': RandomForestClassifier(random_state=42), 'SVM': SVC(random_state=42, probability=True), 'KNN': KNeighborsClassifier(), 'NaiveBayes': GaussianNB(), 'GradientBoosting': GradientBoostingClassifier(random_state=42), 'MLP': MLPClassifier(max_iter=1000, random_state=42), 'AdaBoost': AdaBoostClassifier(random_state=42), 'ExtraTrees': ExtraTreesClassifier(random_state=42) } # 添加可选模型 if XGB_AVAILABLE: models['XGBoost'] = XGBClassifier(use_label_encoder=False, eval_metric='mlogloss', random_state=42) if LGBM_AVAILABLE: models['LightGBM'] = LGBMClassifier(random_state=42) print(f"将训练 {len(models)} 个模型") # 训练和评估模型 results = [] cv_results = [] # 用于存储交叉验证结果 model_objects = {} # 存储训练好的模型对象 print("开始训练模型...") for name, model in models.items(): print(f"训练 {name}...") try: # 训练模型 model.fit(X_train_pca, y_train) model_objects[name] = model # 预测和评估 y_pred = model.predict(X_test_pca) y_pred_proba = model.predict_proba(X_test_pca) if hasattr(model, "predict_proba") else None acc = accuracy_score(y_test, y_pred) report = classification_report(y_test, y_pred, output_dict=True, zero_division=0) # 交叉验证 cv_scores = cross_val_score(model, X_train_pca, y_train, cv=5, scoring='accuracy') cv_mean = cv_scores.mean() cv_std = cv_scores.std() results.append([ name, acc, report['weighted avg']['precision'], report['weighted avg']['recall'], report['weighted avg']['f1-score'], cv_mean, cv_std ]) cv_results.append({ 'model': name, 'scores': cv_scores }) except Exception as e: print(f"训练模型 {name} 时出错: {e}") results.append([name, 0, 0, 0, 0, 0, 0]) # 生成结果表格 headers = ['Model', 'Accuracy', 'Precision', 'Recall', 'F1-Score', 'CV Mean', 'CV Std'] results_df = pd.DataFrame(results, columns=headers) results_df = results_df.sort_values('Accuracy', ascending=False) print("\n模型性能排名:") print_table(results_df.values.tolist(), headers=headers, title="模型性能比较") # ========== 新增:模型性能可视化 ========== print("\n生成模型性能可视化...") # 1. 模型准确率比较 plt.figure(figsize=(12, 8)) models_names = results_df['Model'] accuracies = results_df['Accuracy'] cv_means = results_df['CV Mean'] cv_stds = results_df['CV Std'] x = np.arange(len(models_names)) width = 0.35 plt.bar(x - width / 2, accuracies, width, label='测试集准确率', alpha=0.7) plt.bar(x + width / 2, cv_means, width, yerr=cv_stds, label='交叉验证准确率', alpha=0.7, capsize=5) plt.xlabel('模型') plt.ylabel('准确率') plt.title('模型性能比较') plt.xticks(x, models_names, rotation=45) plt.legend() plt.tight_layout() plt.savefig('model_performance_comparison.png', dpi=300, bbox_inches='tight') plt.show() # 2. 交叉验证箱线图 cv_df = pd.DataFrame({item['model']: item['scores'] for item in cv_results}) plt.figure(figsize=(12, 8)) cv_df.boxplot() plt.title('模型交叉验证准确率分布') plt.xticks(rotation=45) plt.ylabel('准确率') plt.tight_layout() plt.savefig('cv_boxplot.png', dpi=300, bbox_inches='tight') plt.show() # 3. 最佳模型详细分析 best_model_name = results_df.iloc[0]['Model'] best_model = model_objects[best_model_name] print(f"\n对最佳模型 {best_model_name} 进行详细分析...") # 混淆矩阵 y_pred_best = best_model.predict(X_test_pca) cm = confusion_matrix(y_test, y_pred_best) plt.figure(figsize=(10, 8)) sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names) plt.title(f'{best_model_name} 混淆矩阵') plt.xlabel('预测标签') plt.ylabel('真实标签') plt.tight_layout() plt.savefig(f'confusion_matrix_{best_model_name}.png', dpi=300, bbox_inches='tight') plt.show() # 学习曲线 def plot_learning_curve(estimator, title, X, y, cv=None, n_jobs=None, train_sizes=np.linspace(.1, 1.0, 5)): plt.figure(figsize=(10, 6)) train_sizes, train_scores, test_scores = learning_curve( estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes) train_scores_mean = np.mean(train_scores, axis=1) train_scores_std = np.std(train_scores, axis=1) test_scores_mean = np.mean(test_scores, axis=1) test_scores_std = np.std(test_scores, axis=1) plt.grid() plt.fill_between(train_sizes, train_scores_mean - train_scores_std, train_scores_mean + train_scores_std, alpha=0.1, color="r") plt.fill_between(train_sizes, test_scores_mean - test_scores_std, test_scores_mean + test_scores_std, alpha=0.1, color="g") plt.plot(train_sizes, train_scores_mean, 'o-', color="r", label="训练得分") plt.plot(train_sizes, test_scores_mean, 'o-', color="g", label="交叉验证得分") plt.xlabel("训练样本数") plt.ylabel("得分") plt.title(title) plt.legend(loc="best") plt.tight_layout() plt.savefig(f'learning_curve_{best_model_name}.png', dpi=300, bbox_inches='tight') plt.show() plot_learning_curve(best_model, f'{best_model_name} 学习曲线', X_train_pca, y_train, cv=5) # 特征重要性(如果模型支持) if hasattr(best_model, 'feature_importances_'): importances = best_model.feature_importances_ indices = np.argsort(importances)[::-1] plt.figure(figsize=(10, 8)) plt.title(f"{best_model_name} 特征重要性") plt.bar(range(min(20, len(importances))), importances[indices[:20]]) plt.xticks(range(min(20, len(importances))), [f'PC{i + 1}' for i in indices[:20]], rotation=45) plt.tight_layout() plt.savefig(f'feature_importance_{best_model_name}.png', dpi=300, bbox_inches='tight') plt.show() # ========== 新增:SHAP特征分析组合图 ========== if SHAP_AVAILABLE: print("\n生成SHAP特征分析组合图...") # 只为最佳模型生成SHAP组合图 if best_model_name in ['RandomForest', 'DecisionTree', 'GradientBoosting', 'XGBoost', 'LightGBM']: try: # 创建SHAP解释器 explainer = shap.TreeExplainer(best_model) shap_values = explainer.shap_values(X_test_pca) # 对于多分类问题,选择第一个类别的SHAP值 if isinstance(shap_values, list): shap_values_used = shap_values[0] # 使用第一个类别的SHAP值 else: shap_values_used = shap_values # 创建组合图:全局重要性(左)与个体影响(右) fig = plt.figure(figsize=(20, 10)) gs = gridspec.GridSpec(1, 2, width_ratios=[1, 1]) # 左图:全局特征重要性 ax1 = plt.subplot(gs[0]) shap.summary_plot(shap_values_used, X_test_pca, plot_type="bar", feature_names=[f'PC{i + 1}' for i in range(X_test_pca.shape[1])], show=False, max_display=15) ax1.set_title(f'{best_model_name} - SHAP全局特征重要性', fontsize=16, fontweight='bold') # 右图:个体样本SHAP值 ax2 = plt.subplot(gs[1]) # 选择一个有代表性的样本(SHAP值绝对值最大的样本) sample_idx = np.argmax(np.sum(np.abs(shap_values_used), axis=1)) # 绘制瀑布图显示个体样本的SHAP值 shap.waterfall_plot( explainer.expected_value[0] if isinstance(explainer.expected_value, list) else explainer.expected_value, shap_values_used[sample_idx], feature_names=[f'PC{i + 1}' for i in range(X_test_pca.shape[1])], show=False, max_display=15) ax2.set_title(f'{best_model_name} - 个体样本SHAP值分析\n(样本索引: {sample_idx})', fontsize=16, fontweight='bold') plt.tight_layout() plt.savefig(f'shap_combined_{best_model_name}.png', dpi=300, bbox_inches='tight') plt.show() # 创建另一个组合图:SHAP摘要图与依赖图 fig2 = plt.figure(figsize=(20, 10)) gs2 = gridspec.GridSpec(1, 2, width_ratios=[1, 1]) # 左图:SHAP摘要图(显示特征值与SHAP值的关系) ax3 = plt.subplot(gs2[0]) shap.summary_plot(shap_values_used, X_test_pca, feature_names=[f'PC{i + 1}' for i in range(X_test_pca.shape[1])], show=False, max_display=15) ax3.set_title(f'{best_model_name} - SHAP摘要图', fontsize=16, fontweight='bold') # 右图:SHAP依赖图(最重要的特征) ax4 = plt.subplot(gs2[1]) # 找到最重要的特征 if hasattr(best_model, 'feature_importances_'): feature_importances = best_model.feature_importances_ most_important_feature = np.argmax(feature_importances) else: # 如果没有feature_importances_属性,使用SHAP值计算重要性 feature_importances = np.mean(np.abs(shap_values_used), axis=0) most_important_feature = np.argmax(feature_importances) shap.dependence_plot(most_important_feature, shap_values_used, X_test_pca, feature_names=[f'PC{i + 1}' for i in range(X_test_pca.shape[1])], show=False, ax=ax4) ax4.set_title(f'{best_model_name} - SHAP依赖图\n(最重要的特征: PC{most_important_feature + 1})', fontsize=16, fontweight='bold') plt.tight_layout() plt.savefig(f'shap_detailed_{best_model_name}.png', dpi=300, bbox_inches='tight') plt.show() print(f"已保存 SHAP组合图 for {best_model_name}") except Exception as e: print(f"生成SHAP组合图失败 for {best_model_name}: {e}") else: print(f"最佳模型 {best_model_name} 不支持TreeExplainer,跳过SHAP组合图") else: print("\nSHAP不可用,跳过SHAP组合图") # ========== 生成四因子表格 ========== four_factor_cols = ['Short_term_acid_factor', 'Long_term_acid_factor', 'Sulfonate_factor', 'Exposure_factor'] if all(col in df.columns for col in four_factor_cols): four_factor_table = df[[target_column] + four_factor_cols].head(10) # 解码目标变量以显示原始标签 four_factor_table[target_column] = le.inverse_transform(four_factor_table[target_column]) print_table(four_factor_table.values.tolist(), headers=[target_column] + four_factor_cols, title="四因子理论表格 (前10行)") # ========== 生成PCA组件表格 ========== pca_components = pd.DataFrame(pca.components_, columns=features) print(f"\nPCA主成分表格 (前5个主成分):") print_table(pca_components.head().values.tolist(), headers=['PC'] + features, title="PCA主成分权重") # ========== 保存重要结果 ========== results_df.to_csv('model_results.csv', index=False) print("\n模型结果已保存到 'model_results.csv'") # 显示最佳模型 best_model = results_df.iloc[0] print(f"\n最佳模型: {best_model['Model']} (准确率: {best_model['Accuracy']:.4f})") print("\n所有图表已生成完成!")完善改进代码
11-12
import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from matplotlib.collections import PolyCollection import os import matplotlib.tri as mtri from datetime import datetime from geometry_processor import * plt.rcParams["font.family"] = ["SimHei", "WenQuanYi Micro Hei", "Heiti TC"] plt.rcParams["axes.unicode_minus"] = False plt.rcParams["font.family"] = ["SimHei", "WenQuanYi Micro Hei", "Heiti TC", "Arial Unicode MS"] class FlowVisualizer: """流场可视化器""" def __init__(self): self.geometry = None self.results = None self.flow_conditions = None self.config = None self.view_type = 'top' # 默认顶视图 self.analysis_log = [] # 存储日志 def setup(self, geometry, results, flow_conditions, config): """初始化可视化器参数""" self.geometry = geometry self.results = results self.flow_conditions = flow_conditions self.config = config return self def logger(self, message): """日志记录方法,修复缺失的logger属性""" timestamp = datetime.now().strftime("%H:%M:%S") log_entry = f"[{timestamp}] {message}" print(log_entry) # 打印到控制台 self.analysis_log.append(log_entry) # 保存到日志列表 def plot_geometry_overview(self): """绘制几何体总览""" fig = plt.figure(figsize=(15, 10)) # 3D视图 ax1 = fig.add_subplot(221, projection='3d') self._plot_3d_geometry(ax1) # 顶视图 ax2 = fig.add_subplot(222) self._plot_2d_projection(ax2, 'top') # 侧视图 ax3 = fig.add_subplot(223) self._plot_2d_projection(ax3, 'side') # 前视图 ax4 = fig.add_subplot(224) self._plot_2d_projection(ax4, 'front') plt.tight_layout() if self.config.get('save_figures', False): plt.savefig('geometry_overview.png', dpi=self.config.get('dpi', 300), bbox_inches='tight') if self.config.get('auto_show', True): plt.show() def _plot_3d_geometry(self, ax): """绘制3D几何体(修正3D顶点转2D的问题)""" # 采样显示面元(避免过于密集) max_faces = 5000 step = max(1, len(self.geometry.elements) // max_faces) sample_elements = self.geometry.elements[::step] # 存储2D面片和对应的z坐标 faces_2d = [] # 2D顶点(X-Y投影) z_coords = [] # 每个面片的平均z坐标(用于3D定位) for elem in sample_elements: # 获取3D顶点(形状为 (N, 3),N为面元顶点数,如三角形为3,四边形为4) vertices_3d = self.geometry.nodes[elem] # 将3D顶点投影到X-Y平面(提取x和y作为2D坐标) vertices_2d = vertices_3d[:, :2] # 取前两列(x, y) faces_2d.append(vertices_2d) # 计算该面元的平均z坐标(用于在3D空间中定位面片) avg_z = np.mean(vertices_3d[:, 2]) # 取z坐标的平均值 z_coords.append(avg_z) # 创建2D面片集合(此时输入为2D顶点,符合要求) face_collection = PolyCollection( faces_2d, alpha=0.3, facecolor='lightblue', edgecolor='darkblue', linewidth=0.1 ) # 将2D面片添加到3D轴,并通过zs指定每个面片的z坐标 ax.add_collection3d(face_collection, zs=z_coords, zdir='z') # 设置坐标轴范围 bounds = self.geometry.get_bounds() ax.set_xlim(bounds[0]) ax.set_ylim(bounds[1]) ax.set_zlim(bounds[2]) ax.set_xlabel('X') ax.set_ylabel('Y') ax.set_zlabel('Z') ax.set_title('3D几何体') # 设置视角 ax.view_init(elev=20, azim=45) def _plot_2d_projection(self, ax, view_type): """绘制2D投影""" nodes = self.geometry.nodes if view_type == 'top': # X-Y平面 x, y = nodes[:, 0], nodes[:, 1] xlabel, ylabel = 'X (长度)', 'Y (宽度)' title = '顶视图 (X-Y)' elif view_type == 'side': # X-Z平面 x, y = nodes[:, 0], nodes[:, 2] xlabel, ylabel = 'X (长度)', 'Z (高度)' title = '侧视图 (X-Z)' else: # 'front' Y-Z平面 x, y = nodes[:, 1], nodes[:, 2] xlabel, ylabel = 'Y (宽度)', 'Z (高度)' title = '前视图 (Y-Z)' ax.scatter(x, y, s=0.5, alpha=0.6, c='blue') ax.set_xlabel(xlabel) ax.set_ylabel(ylabel) ax.set_title(title) ax.set_aspect('equal') ax.grid(True, alpha=0.3) def plot_pressure_distribution(self): """绘制压力分布,增强版坐标与数据的匹配逻辑""" try: # 获取压力系数数据 if 'shock_expansion' in self.results: pressure_data = self.results['shock_expansion']['element_data']['pressure_coefficients'] data_source = "激波修正后" else: pressure_data = self.results['flow_field']['pressure_coefficients'] data_source = "基本流场" # 严格匹配逻辑 if len(pressure_data) == len(self.geometry.face_centers): coords = self.geometry.face_centers # 使用面元中心坐标 self.logger("使用面元中心坐标") elif len(pressure_data) == len(self.geometry.nodes): coords = self.geometry.nodes # 使用节点坐标 self.logger("使用节点坐标") else: # 自动选择最接近的坐标集 if abs(len(pressure_data) - len(self.geometry.face_centers)) < abs( len(pressure_data) - len(self.geometry.nodes)): coords = self.geometry.face_centers else: coords = self.geometry.nodes self.logger( f"警告: 数据长度不匹配,使用替代坐标 (数据长度={len(pressure_data)}, 坐标长度={len(coords)})") # 投影到2D视图 if self.view_type == 'top': x, y = coords[:, 0], coords[:, 1] xlabel, ylabel = 'X坐标', 'Y坐标' elif self.view_type == 'side': x, y = coords[:, 0], coords[:, 2] xlabel, ylabel = 'X坐标', 'Z坐标' else: x, y = coords[:, 1], coords[:, 2] xlabel, ylabel = 'Y坐标', 'Z坐标' # 确保数据长度一致 min_length = min(len(x), len(pressure_data)) x = x[:min_length] y = y[:min_length] pressure_data = pressure_data[:min_length] # 绘制压力云图 fig, ax = plt.subplots(figsize=(10, 8)) scatter = ax.scatter(x, y, c=pressure_data, cmap='jet', s=10, alpha=0.8) plt.colorbar(scatter, ax=ax, label=f'压力系数 ({data_source})') ax.set_xlabel(xlabel) ax.set_ylabel(ylabel) ax.set_title(f'表面压力系数分布 (Ma={self.flow_conditions["mach_number"]})') ax.set_aspect('equal') if self.config["save_figures"]: self._save_figure(fig, "pressure_distribution") if self.config["auto_show"]: plt.show() else: plt.close(fig) except Exception as e: self.logger(f"❌ 压力分布可视化失败: {str(e)}") # 输出详细调试信息 if hasattr(self, 'geometry'): self.logger(f"调试信息: 节点数={len(self.geometry.nodes)}, 面元数={len(self.geometry.elements)}") if hasattr(self.geometry, 'face_centers'): self.logger(f"面元中心数={len(self.geometry.face_centers)}") if 'flow_field' in self.results: self.logger(f"压力数据长度={len(self.results['flow_field']['pressure_coefficients'])}") def _interpolate_data(self, data, target_length): """插值并强制校验长度""" interpolated = np.interp( np.linspace(0, len(data) - 1, target_length), # 目标索引 np.arange(len(data)), # 原始索引 data ) # 插值后必须与目标长度一致 assert len(interpolated) == target_length, \ f"插值失败!目标长度{target_length},实际{len(interpolated)}" return interpolated def plot_velocity_field(self): """绘制速度场""" if 'flow_field' not in self.results: print("❌ 没有流场数据") return surface_velocities = self.results['flow_field']['surface_velocities'] velocity_magnitudes = np.linalg.norm(surface_velocities, axis=1) fig = plt.figure(figsize=(18, 12)) # 1. 速度大小云图 - 顶视图 ax1 = fig.add_subplot(231) self._plot_contour_2d(ax1, velocity_magnitudes, 'top', 'velocity', '速度大小分布 - 顶视图') # 2. 速度大小云图 - 侧视图 ax2 = fig.add_subplot(232) self._plot_contour_2d(ax2, velocity_magnitudes, 'side', 'velocity', '速度大小分布 - 侧视图') # 3. 速度向量场 - 顶视图 ax3 = fig.add_subplot(233) self._plot_vector_field_2d(ax3, surface_velocities, 'top', '速度向量场 - 顶视图') # 4. 速度向量场 - 侧视图 ax4 = fig.add_subplot(234) self._plot_vector_field_2d(ax4, surface_velocities, 'side', '速度向量场 - 侧视图') # 5. 撞击角分布 ax5 = fig.add_subplot(235) impact_angles = self.results['flow_field']['impact_angles'] self._plot_histogram(ax5, np.degrees(impact_angles), 'angle', '撞击角分布直方图') # 6. 3D速度分布 ax6 = fig.add_subplot(236, projection='3d') self._plot_3d_scalar_field(ax6, velocity_magnitudes, 'velocity', '3D速度大小分布') plt.tight_layout() if self.config.get('save_figures', False): plt.savefig('velocity_field.png', dpi=self.config.get('dpi', 300), bbox_inches='tight') if self.config.get('auto_show', True): plt.show() def plot_streamlines(self): """绘制流线""" if 'streamlines' not in self.results: print("❌ 没有流线数据") return streamlines = self.results['streamlines']['streamlines'] fig = plt.figure(figsize=(15, 10)) # 3D流线 ax1 = fig.add_subplot(121, projection='3d') self._plot_3d_streamlines(ax1, streamlines) # 2D流线投影 ax2 = fig.add_subplot(122) self._plot_2d_streamlines(ax2, streamlines, 'side') plt.tight_layout() if self.config.get('save_figures', False): plt.savefig('streamlines.png', dpi=self.config.get('dpi', 300), bbox_inches='tight') if self.config.get('auto_show', True): plt.show() def plot_shock_patterns(self): """绘制激波模式""" if 'shock_expansion' not in self.results: print("❌ 没有激波数据") return shock_data = self.results['shock_expansion'] fig = plt.figure(figsize=(18, 12)) # 1. 马赫数分布 ax1 = fig.add_subplot(231) mach_data = shock_data['node_data']['mach_numbers'] self._plot_contour_2d(ax1, mach_data, 'side', 'mach', '马赫数分布') # 2. 压力比分布 ax2 = fig.add_subplot(232) pressure_data = shock_data['node_data']['pressures'] pressure_ratio = pressure_data / self.flow_conditions['pressure'] self._plot_contour_2d(ax2, pressure_ratio, 'side', 'pressure_ratio', '压力比分布') # 3. 温度分布 ax3 = fig.add_subplot(233) temp_data = shock_data['node_data']['temperatures'] self._plot_contour_2d(ax3, temp_data, 'side', 'temperature', '温度分布') # 4. 马赫数变化直方图 ax4 = fig.add_subplot(234) self._plot_histogram(ax4, mach_data, 'mach', '马赫数分布直方图') # 5. 激波强度分析 ax5 = fig.add_subplot(235) self._plot_shock_strength_analysis(ax5, shock_data) # 6. 3D激波模式 ax6 = fig.add_subplot(236, projection='3d') self._plot_3d_scalar_field(ax6, pressure_ratio, 'pressure_ratio', '3D压力比分布') plt.tight_layout() if self.config.get('save_figures', False): plt.savefig('shock_patterns.png', dpi=self.config.get('dpi', 300), bbox_inches='tight') if self.config.get('auto_show', True): plt.show() def plot_force_distribution(self): """绘制力分布""" if 'aerodynamics' not in self.results: print("❌ 没有气动力数据") return aero_data = self.results['aerodynamics'] fig = plt.figure(figsize=(15, 8)) # 1. 气动力系数 ax1 = fig.add_subplot(131) self._plot_force_coefficients(ax1, aero_data) # 2. 力分量对比 ax2 = fig.add_subplot(132) self._plot_force_components(ax2, aero_data) # 3. 力矩系数 ax3 = fig.add_subplot(133) self._plot_moment_coefficients(ax3, aero_data) plt.tight_layout() if self.config.get('save_figures', False): plt.savefig('force_distribution.png', dpi=self.config.get('dpi', 300), bbox_inches='tight') if self.config.get('auto_show', True): plt.show() def _validate_data_consistency(self): """全面校验所有数据与几何信息的一致性""" # 节点数量基准值 node_count = len(self.geometry.nodes) # 面元数量基准值 face_count = len(self.geometry.elements) print(f"数据校验: 节点数={node_count}, 面元数={face_count}") # 检查所有数据数组 if hasattr(self, 'data'): for name, data in self.data.items(): data_len = len(data) if data_len != node_count and data_len != face_count: print(f"⚠️ 数据不一致: {name} 长度={data_len} (需要={node_count}或{face_count})") # 计算差异比例 node_ratio = abs(data_len - node_count) / node_count face_ratio = abs(data_len - face_count) / face_count # 提示可能的错误来源 if node_ratio < 0.1: # 差异小于10% print(f" 提示: 与节点数差异较小({node_ratio:.1%}),可能是索引错误") elif face_ratio < 0.1: # 差异小于10% print(f" 提示: 与面元数差异较小({face_ratio:.1%}),可能是计算错误") def _build_face_center_elements(self): """基于哈希表优化的面元中心三角剖分索引构建(解决性能问题)""" try: # 1. 使用哈希表存储边与面元的映射关系(边由两个顶点索引组成的元组表示) edge_map = {} for face_idx, elem in enumerate(self.geometry.elements): # 处理三角形面元(3条边) if len(elem) == 3: edges = [ tuple(sorted((elem[0], elem[1]))), tuple(sorted((elem[1], elem[2]))), tuple(sorted((elem[2], elem[0]))) ] # 处理四边形面元(4条边) elif len(elem) == 4: edges = [ tuple(sorted((elem[0], elem[1]))), tuple(sorted((elem[1], elem[2]))), tuple(sorted((elem[2], elem[3]))), tuple(sorted((elem[3], elem[0]))) ] else: self.logger(f"警告:不支持的面元类型(顶点数:{len(elem)})") continue # 将边添加到哈希表 for edge in edges: if edge not in edge_map: edge_map[edge] = [] edge_map[edge].append(face_idx) # 2. 构建面元邻接关系 face_adjacency = [[] for _ in range(len(self.geometry.elements))] for edge, faces in edge_map.items(): # 每条边最多属于两个面元(共享边) if len(faces) == 2: face1, face2 = faces if face2 not in face_adjacency[face1]: face_adjacency[face1].append(face2) if face1 not in face_adjacency[face2]: face_adjacency[face2].append(face1) # 3. 生成三角剖分索引 elements = [] for i in range(len(face_adjacency)): neighbors = face_adjacency[i] # 取前两个有效邻居构建三角形 if len(neighbors) >= 2: elements.append([i, neighbors[0], neighbors[1]]) # 处理只有一个邻居的情况(找最近的面元) elif len(neighbors) == 1: # 从所有面元中找一个最近的非邻居面元 min_dist = float('inf') closest_face = -1 # 获取当前面元中心坐标 current_center = self.geometry.face_centers[i] # 遍历所有面元找最近的 for j in range(len(face_adjacency)): if j != i and j not in neighbors: dist = np.linalg.norm(current_center - self.geometry.face_centers[j]) if dist < min_dist: min_dist = dist closest_face = j if closest_face != -1: elements.append([i, neighbors[0], closest_face]) return np.array(elements, dtype=int) if elements else None except Exception as e: self.logger(f"构建面元中心索引失败: {str(e)}") return None def _plot_contour_2d(self, ax, data, view_type, data_type, title): try: # 检查数据是定义在节点上还是面元上 if len(data) == len(self.geometry.nodes): # 节点数据:使用原始面元连接关系 nodes = self.geometry.nodes data_is_face = False elements = self.geometry.elements elif len(data) == len(self.geometry.elements): # 面元数据:使用面元中心坐标 face_centers = self.geometry.face_centers data_is_face = True # 确保长度一致 if len(face_centers) != len(self.geometry.elements): raise ValueError("面元中心数与面元数不匹配") elements = self._build_face_center_elements() else: raise ValueError(f"无法识别的数据长度: {len(data)}") if view_type == 'top': # X-Y平面 if data_is_face: x, y = face_centers[:, 0], face_centers[:, 1] else: x, y = nodes[:, 0], nodes[:, 1] xlabel, ylabel = 'X', 'Y' elif view_type == 'side': # X-Z平面 if data_is_face: x, y = face_centers[:, 0], face_centers[:, 2] else: x, y = nodes[:, 0], nodes[:, 2] xlabel, ylabel = 'X', 'Z' else: # front Y-Z平面 if data_is_face: x, y = face_centers[:, 1], face_centers[:, 2] else: x, y = nodes[:, 1], nodes[:, 2] xlabel, ylabel = 'Y', 'Z' # 确保坐标是1D数组 x = x.flatten() if x.ndim > 1 else x y = y.flatten() if y.ndim > 1 else y data = data.flatten() if data.ndim > 1 else data print(f"数据长度: {len(data)}, 节点数: {len(self.geometry.nodes)}, 面元数: {len(self.geometry.elements)}") if data_is_face: print(f"面元中心坐标长度: {len(face_centers)}") else: print(f"节点坐标长度: {len(nodes)}") # 验证数组长度一致 if len(x) != len(y) or len(x) != len(data): raise ValueError(f"数组长度不匹配: x={len(x)}, y={len(y)}, data={len(data)}") # 选择颜色映射 if data_type == 'pressure': cmap = 'RdYlBu_r' label = 'Cp' elif data_type == 'velocity': cmap = 'plasma' label = 'V (m/s)' elif data_type == 'mach': cmap = 'jet' label = 'Ma' elif data_type == 'temperature': cmap = 'hot' label = 'T (K)' elif data_type == 'pressure_ratio': cmap = 'RdYlBu_r' label = 'p/p&infin;' else: cmap = 'viridis' label = 'Value' # 创建三角剖分:根据数据类型使用对应的elements if data_is_face: # 若需处理面元数据,需在此处构建面元中心的拓扑关系(参考之前的方案) # 临时方案:使用自动三角剖分(可能不准确) triangulation = mtri.Triangulation(x, y) else: # 节点数据:使用几何模型中的面元连接关系 triangulation = mtri.Triangulation(x, y, triangles=elements) # 绘制等值线 tcf = ax.tripcolor(triangulation, data, shading='gouraud', cmap=cmap) plt.colorbar(tcf, ax=ax, label=label, shrink=0.8) ax.set_xlabel(xlabel) ax.set_ylabel(ylabel) ax.set_title(title) ax.set_aspect('equal') ax.grid(True, alpha=0.3) return True except Exception as e: print(f"绘制等值线图失败:{e}") import traceback traceback.print_exc() return False # 绘制等值线 def _plot_vector_field_2d(self, ax, vectors, view_type, title): """绘制2D向量场(修复数据与坐标长度不匹配问题)""" # 关键修复:根据向量数据长度选择匹配的坐标集 if len(vectors) == len(self.geometry.face_centers): # 面元数据:使用面元中心坐标 coords = self.geometry.face_centers self.logger(f"使用面元中心坐标绘制向量场 (长度: {len(coords)})") else: # 节点数据:使用节点坐标 coords = self.geometry.nodes self.logger(f"使用节点坐标绘制向量场 (长度: {len(coords)})") # 根据视图类型选择投影平面 if view_type == 'top': x, y = coords[:, 0], coords[:, 1] u, v = vectors[:, 0], vectors[:, 1] xlabel, ylabel = 'X', 'Y' elif view_type == 'side': x, y = coords[:, 0], coords[:, 2] u, v = vectors[:, 0], vectors[:, 2] xlabel, ylabel = 'X', 'Z' else: # front x, y = coords[:, 1], coords[:, 2] u, v = vectors[:, 1], vectors[:, 2] xlabel, ylabel = 'Y', 'Z' # 验证数据长度一致性 if len(x) != len(vectors) or len(y) != len(vectors): raise ValueError( f"向量数据与坐标长度不匹配: 向量={len(vectors)}, " f"X坐标={len(x)}, Y坐标={len(y)}" ) # 采样显示(避免过于密集) step = max(1, len(coords) // 500) # 根据实际坐标数量调整采样步长 sample_indices = range(0, len(coords), step) ax.quiver( x[sample_indices], y[sample_indices], u[sample_indices], v[sample_indices], scale=20, alpha=0.7, width=0.002, color='blue' ) ax.set_xlabel(xlabel) ax.set_ylabel(ylabel) ax.set_title(title) ax.set_aspect('equal') def _plot_histogram(self, ax, data, data_type, title): """绘制直方图""" if data_type == 'pressure': bins = 50 xlabel = '压力系数 Cp' color = 'skyblue' elif data_type == 'velocity': bins = 50 xlabel = '速度大小 (m/s)' color = 'lightgreen' elif data_type == 'angle': bins = 50 xlabel = '撞击角 ()' color = 'orange' elif data_type == 'mach': bins = 50 xlabel = '马赫数' color = 'lightcoral' else: bins = 50 xlabel = 'Value' color = 'gray' ax.hist(data, bins=bins, alpha=0.7, edgecolor='black', color=color) ax.set_xlabel(xlabel) ax.set_ylabel('频数') ax.set_title(title) ax.grid(True, alpha=0.3) def _plot_line_variation(self, ax, data, direction, data_type, title): """绘制沿某方向的变化""" nodes = self.geometry.nodes if direction == 'x': coord = nodes[:, 0] xlabel = 'X坐标' elif direction == 'y': coord = nodes[:, 1] xlabel = 'Y坐标' else: # 'z' coord = nodes[:, 2] xlabel = 'Z坐标' # 按坐标排序 sort_indices = np.argsort(coord) ax.scatter(coord[sort_indices], data[sort_indices], alpha=0.6, s=1) ax.set_xlabel(xlabel) if data_type == 'pressure': ax.set_ylabel('压力系数 Cp') elif data_type == 'velocity': ax.set_ylabel('速度大小 (m/s)') else: ax.set_ylabel('Value') ax.set_title(title) ax.grid(True, alpha=0.3) def _plot_windward_leeward_comparison(self, ax, data, data_type, title): """绘制迎风面/背风面对比""" # 获取撞击角数据 impact_angles = self.results['flow_field']['impact_angles'] # 关键修复:确保撞击角数组与数据数组长度一致 # 如果不一致,尝试通过插值或采样使它们匹配 if len(impact_angles) != len(data): print(f"⚠️ 撞击角数据长度({len(impact_angles)})与待可视化数据长度({len(data)})不匹配,正在进行匹配处理...") # 方案1:如果impact_angles更长,尝试下采样到与data相同长度 if len(impact_angles) > len(data): step = len(impact_angles) // len(data) impact_angles = impact_angles[::step][:len(data)] # 方案2:如果data更长,使用线性插值扩展impact_angles else: from scipy.interpolate import interp1d x_old = np.linspace(0, 1, len(impact_angles)) x_new = np.linspace(0, 1, len(data)) f = interp1d(x_old, impact_angles, kind='linear') impact_angles = f(x_new) # 分离迎风面和背风面 windward_mask = impact_angles > 0 leeward_mask = impact_angles <= 0 windward_data = data[windward_mask] leeward_data = data[leeward_mask] # 绘制对比直方图 ax.hist(windward_data, bins=30, alpha=0.7, label='迎风面', color='red') ax.hist(leeward_data, bins=30, alpha=0.7, label='背风面', color='blue') if data_type == 'pressure': ax.set_xlabel('压力系数 Cp') elif data_type == 'velocity': ax.set_xlabel('速度大小 (m/s)') else: ax.set_xlabel('Value') ax.set_ylabel('频数') ax.set_title(title) ax.legend() ax.grid(True, alpha=0.3) def _plot_3d_scalar_field(self, ax, data, data_type, title): """绘制3D标量场""" nodes = self.geometry.nodes # 选择颜色映射 if data_type == 'pressure': cmap = 'RdYlBu_r' elif data_type == 'velocity': cmap = 'plasma' elif data_type == 'mach': cmap = 'jet' elif data_type == 'temperature': cmap = 'hot' elif data_type == 'pressure_ratio': cmap = 'RdYlBu_r' else: cmap = 'viridis' # 3D散点图 scatter = ax.scatter(nodes[:, 0], nodes[:, 1], nodes[:, 2], c=data, cmap=cmap, s=2, alpha=0.6) plt.colorbar(scatter, ax=ax, shrink=0.5, aspect=20) ax.set_xlabel('X') ax.set_ylabel('Y') ax.set_zlabel('Z') ax.set_title(title) def _plot_3d_streamlines(self, ax, streamlines, max_lines=20): """绘制3D流线""" count = 0 for node_idx, coordinates in streamlines.items(): if count >= max_lines: break if len(coordinates) > 1: coords_array = np.array(coordinates) ax.plot(coords_array[:, 0], coords_array[:, 1], coords_array[:, 2], alpha=0.7, linewidth=1) count += 1 ax.set_xlabel('X') ax.set_ylabel('Y') ax.set_zlabel('Z') ax.set_title(f'3D表面流线 (显示{count}条)') def _plot_2d_streamlines(self, ax, streamlines, view_type='side', max_lines=30): """绘制2D流线投影""" count = 0 for node_idx, coordinates in streamlines.items(): if count >= max_lines: break if len(coordinates) > 1: coords_array = np.array(coordinates) if view_type == 'top': x, y = coords_array[:, 0], coords_array[:, 1] xlabel, ylabel = 'X', 'Y' elif view_type == 'side': x, y = coords_array[:, 0], coords_array[:, 2] xlabel, ylabel = 'X', 'Z' else: # front x, y = coords_array[:, 1], coords_array[:, 2] xlabel, ylabel = 'Y', 'Z' ax.plot(x, y, alpha=0.7, linewidth=1) count += 1 ax.set_xlabel(xlabel) ax.set_ylabel(ylabel) ax.set_title(f'2D流线投影 (显示{count}条)') ax.set_aspect('equal') def _plot_shock_strength_analysis(self, ax, shock_data): """绘制激波强度分析""" pressures = shock_data['node_data']['pressures'] initial_pressure = self.flow_conditions['pressure'] pressure_ratios = pressures / initial_pressure # 分类:压缩、膨胀、无变化 compression_mask = pressure_ratios > 1.05 expansion_mask = pressure_ratios < 0.95 unchanged_mask = (pressure_ratios >= 0.95) & (pressure_ratios <= 1.05) compression_count = np.sum(compression_mask) expansion_count = np.sum(expansion_mask) unchanged_count = np.sum(unchanged_mask) labels = ['压缩区', '膨胀区', '无变化区'] counts = [compression_count, expansion_count, unchanged_count] colors = ['red', 'blue', 'gray'] ax.pie(counts, labels=labels, colors=colors, autopct='%1.1f%%', startangle=90) ax.set_title('激波效应分布') def _plot_force_coefficients(self, ax, aero_data): """绘制气动力系数""" coeffs = aero_data['coefficients'] labels = ['CL', 'CD', 'CY'] values = [coeffs['CL'], coeffs['CD'], coeffs['CY']] colors = ['blue', 'red', 'green'] bars = ax.bar(labels, values, color=colors, alpha=0.7) ax.set_ylabel('系数值') ax.set_title('气动力系数') ax.grid(True, alpha=0.3) # 添加数值标签 for bar, value in zip(bars, values): height = bar.get_height() ax.text(bar.get_x() + bar.get_width() / 2., height + height * 0.01, f'{value:.6f}', ha='center', va='bottom') def _plot_force_components(self, ax, aero_data): """绘制力分量对比""" pressure_forces = aero_data['pressure_forces'] viscous_forces = aero_data['viscous_forces'] components = ['Fx', 'Fy', 'Fz'] pressure_values = [pressure_forces['fx'], pressure_forces['fy'], pressure_forces['fz']] viscous_values = [viscous_forces['fx'], viscous_forces['fy'], viscous_forces['fz']] x = np.arange(len(components)) width = 0.35 ax.bar(x - width / 2, pressure_values, width, label='压力力', alpha=0.7, color='blue') ax.bar(x + width / 2, viscous_values, width, label='粘性力', alpha=0.7, color='red') ax.set_xlabel('力分量') ax.set_ylabel('力 (N)') ax.set_title('气动力分量对比') ax.set_xticks(x) ax.set_xticklabels(components) ax.legend() ax.grid(True, alpha=0.3) def _plot_moment_coefficients(self, ax, aero_data): """绘制力矩系数""" coeffs = aero_data['coefficients'] labels = ['Cl', 'Cm', 'Cn'] values = [coeffs['Cl'], coeffs['Cm'], coeffs['Cn']] colors = ['orange', 'purple', 'brown'] bars = ax.bar(labels, values, color=colors, alpha=0.7) ax.set_ylabel('系数值') ax.set_title('力矩系数') ax.grid(True, alpha=0.3) # 添加数值标签 for bar, value in zip(bars, values): height = bar.get_height() ax.text(bar.get_x() + bar.get_width() / 2., height + height * 0.01, f'{value:.6f}', ha='center', va='bottom') def _save_figure(self, fig, name): """保存图片到文件""" try: import os from pathlib import Path # 创建图片保存目录 img_dir = os.path.join("results", "images") Path(img_dir).mkdir(parents=True, exist_ok=True) # 保存图片 filename = f"{name}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.png" filepath = os.path.join(img_dir, filename) fig.savefig(filepath, dpi=self.config.get('dpi', 300), bbox_inches='tight') self.logger(f"图片保存成功: {filepath}") except Exception as e: self.logger(f"❌ 图片保存失败: {str(e)}") 'c' argument has 5110 elements, which is inconsistent with 'x' and 'y' with size 4909.怎么解决?
07-24
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

LIQING LIN

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值