<think>根据提供的两个文件(`new_algorithm.py`和`tempcoderunnerfile.py`),我们需要将`new_algorithm.py`中的功能整合到`app.py`中,特别是`GasSensorDataAnalyzer`类和`AlgorithmSelector`类。同时,我们需要修改`/upload`和`/analyze`路由,以支持从两个文件中加载数据并进行多维度分析。以下是修改后的`app.py`文件,主要改动包括:1.导入`new_algorithm.py`中的`GasSensorDataAnalyzer`和`AlgorithmSelector`类。2.在文件上传路由`/upload`中,使用`GasSensorDataAnalyzer`加载两个文件的数据,并合并。3.在数据分析路由`/analyze`中,使用`AlgorithmSelector`训练模型并返回结果。4.修改全局变量以存储`GasSensorDataAnalyzer`实例和加载的数据集。注意:由于原`app.py`中的全局变量存储方式与`new_algorithm.py`中的类有冲突,我们将调整全局变量的使用,改为存储`GasSensorDataAnalyzer`的实例和合并后的数据集。另外,我们还需要注意以下几点:-文件上传时,前端需要提供两个文件,以及每个文件对应的气体类型、浓度和传感器类型(因为`new_algorithm.py`中的`load_multiple_gas_data`需要这些信息)。-在`/upload`路由中,我们不再使用之前的`load_and_merge_files`和`preprocess_data`函数,而是使用`GasSensorDataAnalyzer`的方法。-由于`new_algorithm.py`中的`GasSensorDataAnalyzer`类已经包含了数据加载和预处理的功能,因此我们不再需要之前的预处理步骤。由于代码较长,我将逐步整合并给出完整的`app.py`代码。###修改后的完整`app.py`代码```pythonfromflaskimportFlask,request,jsonify,send_from_directoryfromflask_corsimportCORSimportosimportpandasaspdimportnumpyasnpfromwerkzeug.utilsimportsecure_filenameimportloggingimporttime#导入new_algorithm.py中的类fromnew_algorithmimportGasSensorDataAnalyzer,AlgorithmSelector#配置日志logging.basicConfig(level=logging.DEBUG,format='%(asctime)s-%(name)s-%(levelname)s-%(message)s',handlers=[logging.StreamHandler(),#输出到控制台logging.FileHandler('app.log',encoding='utf-8')#输出到文件])logger=logging.getLogger(__name__)app=Flask(__name__)app.config['MAX_CONTENT_LENGTH']=100*1024*1024#100MB文件大小限制#配置CORS允许Vue前端访问CORS(app,resources={r"/*":{"origins":"http://localhost:5177",#指定Vue前端地址"methods":["GET","POST","PUT","DELETE","OPTIONS"],"allow_headers":["Content-Type","Authorization"],"supports_credentials":True}})#配置上传文件夹UPLOAD_FOLDER='uploads'ifnotos.path.exists(UPLOAD_FOLDER):os.makedirs(UPLOAD_FOLDER)app.config['UPLOAD_FOLDER']=UPLOAD_FOLDER#允许的文件扩展名ALLOWED_EXTENSIONS={'csv','xlsx','xls'}#全局变量存储data_analyzer=GasSensorDataAnalyzer()#创建数据加载器实例X_combined=None#合并后的特征数据y_combined=None#合并后的标签数据gas_types=[]#存储气体类型concentrations=[]#存储浓度sensor_types=[]#存储传感器类型last_activity=time.time()#最后活动时间戳defallowed_file(filename):"""检查文件扩展名是否合法"""return'.'infilenameand\filename.rsplit('.',1)[1].lower()inALLOWED_EXTENSIONS@app.route('/')defindex():"""健康检查端点"""globallast_activitystatus={'status':'running','version':'1.0.0','last_activity':time.ctime(last_activity),'endpoints':{'/upload':'POST-Uploaddatafiles','/analyze':'POST-Analyzedata','/reset':'POST-Resetdata','/columns':'GET-Getdatasetcolumns','/status':'GET-Servicestatus'}}logger.info(f"Statusrequest:{status}")returnjsonify(status)@app.route('/status')defstatus():"""服务状态检查"""globalX_combined,last_activityreturnjsonify({'status':'active','timestamp':time.time(),'dataset_loaded':X_combinedisnotNone,'dataset_shape':X_combined.shapeifX_combinedisnotNoneelseNone,'last_activity':time.ctime(last_activity)})@app.route('/upload',methods=['POST'])defupload_files():"""处理文件上传"""globalX_combined,y_combined,gas_types,concentrations,sensor_types,last_activitylogger.info("Receiveduploadrequest")#检查是否有文件if'files'notinrequest.files:logger.error("Nofilepartinrequest")returnjsonify({'error':'Nofilepart'}),400files=request.files.getlist('files')iflen(files)==0orfiles[0].filename=='':logger.error("Noselectedfiles")returnjsonify({'error':'Noselectedfiles'}),400#过滤合法文件valid_files=[fforfinfilesifallowed_file(f.filename)]ifnotvalid_files:logger.error("Novalidfilesfound")returnjsonify({'error':'Novalidfiles.OnlyCSV,XLSX,XLSareallowed.'}),400#从表单获取传感器和气体信息try:#获取两个文件对应的传感器类型、气体类型和浓度sensor_type1=request.form.get('sensor_type1','MP2')gas_type1=request.form.get('gas_type1','acetone')concentration1=float(request.form.get('concentration1',20))sensor_type2=request.form.get('sensor_type2','MP2')gas_type2=request.form.get('gas_type2','acetone')concentration2=float(request.form.get('concentration2',20))exceptExceptionase:logger.error(f"Errorparsingformdata:{str(e)}",exc_info=True)returnjsonify({'error':'Invalidformdata.Pleasecheckconcentrationvalues.'}),400#保存文件并获取文件路径列表file_paths=[]forfileinvalid_files:try:filename=secure_filename(file.filename)file_path=os.path.join(app.config['UPLOAD_FOLDER'],filename)file.save(file_path)logger.info(f"Savedfile:{file_path}")file_paths.append(file_path)exceptExceptionase:logger.error(f"Errorsavingfile{file.filename}:{str(e)}",exc_info=True)continueiflen(file_paths)<2:logger.error("Needatleasttwofilesforanalysis")returnjsonify({'error':'Needatleasttwofilesforanalysis'}),400#设置气体类型、浓度和传感器类型列表gas_types=[gas_type1,gas_type2]concentrations=[concentration1,concentration2]sensor_types=[sensor_type1,sensor_type2]#加载数据try:#使用GasSensorDataAnalyzer加载多个气体数据X_combined,y_combined=data_analyzer.load_multiple_gas_data(file_paths,gas_types,concentrations,sensor_types)ifX_combinedisNoneorlen(X_combined)==0:logger.error("Failedtoloaddatafromfiles")returnjsonify({'error':'Failedtoloaddata.Checkfilecontent.'}),500logger.info(f"Loadedcombineddata:{len(X_combined)}samples,{X_combined.shape[1]}features")#更新最后活动时间last_activity=time.time()#返回成功响应response={'message':f'Successfullyuploadedandmerged{len(file_paths)}files','sample_count':len(X_combined),'gas_types':gas_types,'concentrations':concentrations,'sensor_types':sensor_types}returnjsonify(response),200exceptExceptionase:logger.error(f"Errorloadingdata:{str(e)}",exc_info=True)returnjsonify({'error':f'Errorloadingdata:{str(e)}'}),500@app.route('/analyze',methods=['POST'])defanalyze_data():"""执行数据分析"""globalX_combined,y_combined,last_activitylogger.info("Receivedanalyzerequest")#检查数据是否已加载ifX_combinedisNoneory_combinedisNone:logger.error("Nodatasetavailable")returnjsonify({'error':'Nodataavailable.Pleaseuploadfilesfirst.'}),400#获取前端传递的算法参数try:data=request.get_json()ifnotdata:logger.error("Invalidrequestparameters")returnjsonify({'error':'Invalidrequestparameters'}),400#获取算法参数,提供默认值params=data.get('params',{})knn_params=params.get('knn',{'n_neighbors':5,'metric':'euclidean'})svm_params=params.get('svm',{'C':1.0,'kernel':'rbf'})rf_params=params.get('random_forest',{'n_estimators':100,'max_depth':None})exceptExceptionase:logger.error(f"ErrorparsingJSONdata:{str(e)}",exc_info=True)returnjsonify({'error':'InvalidJSONdata'}),400#创建算法选择器(默认使用中文)selector=AlgorithmSelector(use_chinese=True)#设置算法参数try:selector.set_algorithm_params('knn',knn_params)selector.set_algorithm_params('svm',svm_params)selector.set_algorithm_params('random_forest',rf_params)exceptExceptionase:logger.error(f"Errorsettingalgorithmparameters:{str(e)}")returnjsonify({'error':f'Errorsettingalgorithmparameters:{str(e)}'}),400#训练模型try:results=selector.train_models(X_combined,y_combined)logger.info("Algorithmtrainingcompleted")exceptExceptionase:logger.error(f"Errortrainingmodels:{str(e)}",exc_info=True)returnjsonify({'error':f'Errortrainingmodels:{str(e)}'}),500#提取需要返回的结果response_results={}foralgo_name,resultinresults.items():#如果训练出错,记录错误信息if'error'inresult:response_results[algo_name]={'name':result['name'],'error':result['error']}else:#分类报告转为字符串report_str=result.get('classification_report','Noreport')#如果是字典,转换为字符串ifisinstance(report_str,dict):report_str=classification_report_dict_to_str(report_str)response_results[algo_name]={'name':result['name'],'train_accuracy':result['train_accuracy'],'test_accuracy':result['test_accuracy'],'classification_report':report_str,#特征重要性只有随机森林有'feature_importances':result.get('feature_importances',None)}#更新最后活动时间last_activity=time.time()returnjsonify({'message':'Analysiscompletedsuccessfully','results':response_results}),200defclassification_report_dict_to_str(report_dict):"""将分类报告字典转换为字符串"""report_str=""#添加每个类别的精度、召回、F1值forlabel,metricsinreport_dict.items():ifisinstance(metrics,dict):#确保是类别的指标report_str+=f"Class{label}:precision={metrics['precision']:.4f},recall={metrics['recall']:.4f},f1-score={metrics['f1-score']:.4f}\n"#添加总体指标report_str+=f"Overallaccuracy:{report_dict['accuracy']:.4f}\n"report_str+=f"Macroavg:precision={report_dict['macroavg']['precision']:.4f},recall={report_dict['macroavg']['recall']:.4f},f1-score={report_dict['macroavg']['f1-score']:.4f}\n"report_str+=f"Weightedavg:precision={report_dict['weightedavg']['precision']:.4f},recall={report_dict['weightedavg']['recall']:.4f},f1-score={report_dict['weightedavg']['f1-score']:.4f}"returnreport_str@app.route('/reset',methods=['POST'])defreset_data():"""重置数据集"""globalX_combined,y_combined,gas_types,concentrations,sensor_types,last_activityX_combined=Noney_combined=Nonegas_types=[]concentrations=[]sensor_types=[]last_activity=time.time()logger.info("Datareset")returnjsonify({'message':'Dataresetsuccessfully'}),200@app.route('/uploads/<filename>',methods=['GET'])defuploaded_file(filename):"""访问上传的文件"""returnsend_from_directory(app.config['UPLOAD_FOLDER'],filename)if__name__=='__main__':#开发环境-允许Vue前端访问app.run(host='0.0.0.0',port=5000,debug=True)```###主要修改说明1.**导入`new_algorithm.py`中的类**:```pythonfromnew_algorithmimportGasSensorDataAnalyzer,AlgorithmSelector```2.**全局变量调整**:-创建`GasSensorDataAnalyzer`实例`data_analyzer`。-使用`X_combined`和`y_combined`存储合并后的数据集。-保留`gas_types`,`concentrations`,`sensor_types`存储气体和传感器信息。3.**`/upload`路由修改**:-保存上传的两个文件到服务器。-使用`data_analyzer.load_multiple_gas_data`加载两个文件的数据。-将加载的数据集存储在全局变量`X_combined`和`y_combined`中。4.**`/analyze`路由修改**:-创建`AlgorithmSelector`实例。-设置算法参数(从前端获取)。-调用`selector.train_models(X_combined,y_combined)`训练模型。-将训练结果转换为适合JSON返回的格式(注意分类报告的处理)。5.**分类报告处理**:-由于`classification_report`返回的可能是字典,我们将其转换为字符串以便在JSON响应中返回。我们添加了`classification_report_dict_to_str`函数进行转换。6.**错误处理**:-在关键步骤添加了异常捕获和日志记录。###注意事项1.**文件数量**:此版本要求至少上传两个文件。如果只有一个文件,将返回错误。2.**气体和传感器信息**:前端需要为每个文件提供对应的气体类型、浓度和传感器类型。这些信息通过表单字段`sensor_type1`,`gas_type1`,`concentration1`和`sensor_type2`,`gas_type2`,`concentration2`传递。3.**算法参数**:前端传递的算法参数格式应与之前一致,但请注意`new_algorithm.py`中的`AlgorithmSelector`类支持的算法和参数。4.**分类报告**:由于`sklearn.metrics.classification_report`返回的字典结构复杂,我们只提取了主要指标转换为字符串。如果需要更详细的信息,可以调整转换函数。5.**特征重要性**:只有随机森林算法会返回特征重要性,其他算法没有。###待优化点1.**内存管理**:加载的数据集存储在全局变量中,可能占用较多内存。在重置或长时间不活动时,考虑释放内存。2.**多用户支持**:当前全局变量设计不支持多用户并发。如果多个用户同时使用,数据会互相覆盖。生产环境需要改为每个会话独立存储。3.**错误处理**:在数据加载和模型训练过程中,可能需要更详细的错误信息返回给前端。此修改后的`app.py`文件应该能够满足您的需求:上传两个文件,并分析数据。请根据实际测试情况调整细节。§§相关问题§§:1.如何优化Flask应用以支持多用户并发访问?2.在机器学习API中,如何处理大文件上传和长时间训练任务?3.如何将训练好的模型保存并复用,避免重复训练?4.在Vue前端中,如何优雅地展示混淆矩阵和特征重要性?</think>根据您提供的 `new_algorithm.py` 和 `tempcoderunnerfile.py` 文件,我将重新修改 `app.py` 文件,确保它能与 `GasSensorDataAnalyzer` 和 `AlgorithmSelector` 类无缝集成,并支持从两个文件加载数据进行分析。
以下是完整的修改后的 `app.py` 文件:
```python
from flask import Flask, request, jsonify, send_from_directory
from flask_cors import CORS
import os
import numpy as np
import pandas as pd
import logging
import time
from werkzeug.utils import secure_filename
from new_algorithm import GasSensorDataAnalyzer, AlgorithmSelector, detect_dataset_type, extract_gas_type, extract_concentration, extract_sensor_type
# 配置日志
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.StreamHandler(), # 输出到控制台
logging.FileHandler('app.log', encoding='utf-8') # 输出到文件
]
)
logger = logging.getLogger(__name__)
app = Flask(__name__)
app.config['MAX_CONTENT_LENGTH'] = 100 * 1024 * 1024 # 100MB 文件大小限制
# 配置CORS允许Vue前端访问
CORS(app, resources={
r"/*": {
"origins": "http://localhost:5177", # 指定Vue前端地址
"methods": ["GET", "POST", "PUT", "DELETE", "OPTIONS"],
"allow_headers": ["Content-Type", "Authorization"],
"supports_credentials": True
}
})
# 配置上传文件夹
UPLOAD_FOLDER = 'uploads'
if not os.path.exists(UPLOAD_FOLDER):
os.makedirs(UPLOAD_FOLDER)
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
# 允许的文件扩展名
ALLOWED_EXTENSIONS = {'csv', 'xlsx', 'xls'}
# 全局数据集存储
data_analyzer = GasSensorDataAnalyzer()
X_combined = None
y_combined = None
gas_types = []
concentrations = []
sensor_types = []
last_activity = time.time()
algorithm_selector = AlgorithmSelector(use_chinese=True)
def allowed_file(filename):
"""检查文件扩展名是否合法"""
return '.' in filename and \
filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS
def save_and_extract_file_info(file):
"""保存文件并提取气体信息"""
try:
filename = secure_filename(file.filename)
file_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
file.save(file_path)
logger.info(f"Saved file: {file_path}")
# 从文件名提取信息
sensor_type = extract_sensor_type(filename)
gas_type = extract_gas_type(filename)
concentration = extract_concentration(filename)
return file_path, sensor_type, gas_type, concentration
except Exception as e:
logger.error(f"Error processing file {file.filename}: {str(e)}", exc_info=True)
return None, None, None, None
@app.route('/')
def index():
"""健康检查端点"""
global last_activity
status = {
'status': 'running',
'version': '1.0.0',
'last_activity': time.ctime(last_activity),
'endpoints': {
'/upload': 'POST - Upload data files',
'/analyze': 'POST - Analyze data',
'/reset': 'POST - Reset data',
'/columns': 'GET - Get dataset columns',
'/status': 'GET - Service status'
}
}
logger.info(f"Status request: {status}")
return jsonify(status)
@app.route('/status')
def status():
"""服务状态检查"""
global X_combined, last_activity
return jsonify({
'status': 'active',
'timestamp': time.time(),
'dataset_loaded': X_combined is not None,
'dataset_shape': X_combined.shape if X_combined is not None else None
})
@app.route('/upload', methods=['POST'])
def upload_files():
"""处理文件上传"""
global X_combined, y_combined, gas_types, concentrations, sensor_types, last_activity
logger.info("Received upload request")
# 检查是否有文件
if 'files' not in request.files:
logger.error("No file part in request")
return jsonify({'error': 'No file part'}), 400
files = request.files.getlist('files')
if len(files) == 0 or files[0].filename == '':
logger.error("No selected files")
return jsonify({'error': 'No selected files'}), 400
# 过滤合法文件
valid_files = [f for f in files if allowed_file(f.filename)]
if len(valid_files) < 2:
logger.error("Need at least two files for analysis")
return jsonify({'error': 'Need at least two files for analysis'}), 400
# 保存文件并提取信息
file_paths = []
extracted_info = []
for file in valid_files:
file_path, sensor_type, gas_type, concentration = save_and_extract_file_info(file)
if file_path:
file_paths.append(file_path)
extracted_info.append({
'sensor_type': sensor_type,
'gas_type': gas_type,
'concentration': concentration
})
if len(file_paths) < 2:
logger.error("Failed to process enough files")
return jsonify({'error': 'Failed to process files. Need at least two valid files.'}), 500
# 准备数据加载参数
sensor_types = [info['sensor_type'] for info in extracted_info]
gas_types = [info['gas_type'] for info in extracted_info]
concentrations = [info['concentration'] for info in extracted_info]
# 加载数据
try:
X_combined, y_combined = data_analyzer.load_multiple_gas_data(
file_paths, gas_types, concentrations, sensor_types
)
if X_combined is None or len(X_combined) == 0:
logger.error("Failed to load data from files")
return jsonify({'error': 'Failed to load data. Check file content.'}), 500
logger.info(f"Loaded combined data: {len(X_combined)} samples, {X_combined.shape[1]} features")
# 获取多维度标签信息
label_info = []
for label in np.unique(y_combined):
for key, label_id in data_analyzer.multi_dimension_labels.items():
if label_id == label:
parts = key.split('_')
sensor = parts[0]
gas = parts[1]
conc = parts[2].replace('ppm', '')
label_info.append({
'id': int(label),
'sensor': sensor,
'gas': gas,
'concentration': conc,
'name_cn': data_analyzer.get_or_create_multi_dimension_label(sensor, gas, int(conc))[1]['cn'],
'name_en': data_analyzer.get_or_create_multi_dimension_label(sensor, gas, int(conc))[1]['en']
})
# 更新最后活动时间
last_activity = time.time()
# 返回成功响应
response = {
'message': f'Successfully uploaded and merged {len(file_paths)} files',
'sample_count': len(X_combined),
'gas_types': gas_types,
'concentrations': concentrations,
'sensor_types': sensor_types,
'label_info': label_info,
'num_classes': len(np.unique(y_combined))
}
return jsonify(response), 200
except Exception as e:
logger.error(f"Error loading data: {str(e)}", exc_info=True)
return jsonify({'error': f'Error loading data: {str(e)}'}), 500
@app.route('/analyze', methods=['POST'])
def analyze_data():
"""执行数据分析"""
global X_combined, y_combined, algorithm_selector, last_activity
logger.info("Received analyze request")
# 检查数据是否已加载
if X_combined is None or y_combined is None:
logger.error("No dataset available")
return jsonify({'error': 'No data available. Please upload files first.'}), 400
# 获取前端传递的算法参数
try:
data = request.get_json()
if not data or 'params' not in data:
logger.error("Invalid request parameters")
return jsonify({'error': 'Invalid request parameters'}), 400
params = data.get('params', {})
# 设置算法参数
for algo_name, algo_params in params.items():
if algo_name in algorithm_selector.algorithms:
algorithm_selector.set_algorithm_params(algo_name, algo_params)
logger.info(f"Set parameters for {algo_name}: {algo_params}")
except Exception as e:
logger.error(f"Error parsing JSON data: {str(e)}", exc_info=True)
return jsonify({'error': 'Invalid JSON data'}), 400
# 训练模型
try:
results = algorithm_selector.train_models(X_combined, y_combined)
logger.info("Algorithm training completed")
except Exception as e:
logger.error(f"Error training models: {str(e)}", exc_info=True)
return jsonify({'error': f'Error training models: {str(e)}'}), 500
# 提取需要返回的结果
response_results = {}
for algo_name, result in results.items():
# 如果训练出错,记录错误信息
if 'error' in result:
response_results[algo_name] = {
'name': result['name'],
'error': result['error']
}
else:
response_results[algo_name] = {
'name': result['name'],
'train_accuracy': result['train_accuracy'],
'test_accuracy': result['test_accuracy'],
'classification_report': result.get('classification_report', 'No report'),
'feature_importances': result.get('feature_importances', None)
}
# 更新最后活动时间
last_activity = time.time()
return jsonify({
'message': 'Analysis completed successfully',
'results': response_results
}), 200
@app.route('/reset', methods=['POST'])
def reset_data():
"""重置数据集"""
global X_combined, y_combined, gas_types, concentrations, sensor_types, last_activity
X_combined = None
y_combined = None
gas_types = []
concentrations = []
sensor_types = []
last_activity = time.time()
logger.info("Data reset")
return jsonify({'message': 'Data reset successfully'}), 200
@app.route('/uploads/<filename>', methods=['GET'])
def uploaded_file(filename):
"""访问上传的文件"""
return send_from_directory(app.config['UPLOAD_FOLDER'], filename)
if __name__ == '__main__':
# 开发环境 - 允许Vue前端访问
app.run(host='0.0.0.0', port=5000, debug=True)
```
### 主要修改和集成说明
1. **导入 `new_algorithm.py` 的功能**:
```python
from new_algorithm import GasSensorDataAnalyzer, AlgorithmSelector, detect_dataset_type, extract_gas_type, extract_concentration, extract_sensor_type
```
2. **全局变量初始化**:
```python
data_analyzer = GasSensorDataAnalyzer()
X_combined = None
y_combined = None
algorithm_selector = AlgorithmSelector(use_chinese=True)
```
3. **文件处理和元数据提取**:
```python
def save_and_extract_file_info(file):
"""保存文件并提取气体信息"""
try:
filename = secure_filename(file.filename)
file_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
file.save(file_path)
# 从文件名提取信息
sensor_type = extract_sensor_type(filename)
gas_type = extract_gas_type(filename)
concentration = extract_concentration(filename)
return file_path, sensor_type, gas_type, concentration
except Exception as e:
logger.error(f"Error processing file {file.filename}: {str(e)}", exc_info=True)
return None, None, None, None
```
4. **多文件数据加载**:
```python
# 保存文件并提取信息
file_paths = []
extracted_info = []
for file in valid_files:
file_path, sensor_type, gas_type, concentration = save_and_extract_file_info(file)
if file_path:
file_paths.append(file_path)
extracted_info.append({
'sensor_type': sensor_type,
'gas_type': gas_type,
'concentration': concentration
})
# 准备数据加载参数
sensor_types = [info['sensor_type'] for info in extracted_info]
gas_types = [info['gas_type'] for info in extracted_info]
concentrations = [info['concentration'] for info in extracted_info]
# 加载数据
X_combined, y_combined = data_analyzer.load_multiple_gas_data(
file_paths, gas_types, concentrations, sensor_types
)
```
5. **标签信息提取**:
```python
# 获取多维度标签信息
label_info = []
for label in np.unique(y_combined):
for key, label_id in data_analyzer.multi_dimension_labels.items():
if label_id == label:
parts = key.split('_')
sensor = parts[0]
gas = parts[1]
conc = parts[2].replace('ppm', '')
label_info.append({
'id': int(label),
'sensor': sensor,
'gas': gas,
'concentration': conc,
'name_cn': data_analyzer.get_or_create_multi_dimension_label(sensor, gas, int(conc))[1]['cn'],
'name_en': data_analyzer.get_or_create_multi_dimension_label(sensor, gas, int(conc))[1]['en']
})
```
6. **算法参数设置和训练**:
```python
# 设置算法参数
for algo_name, algo_params in params.items():
if algo_name in algorithm_selector.algorithms:
algorithm_selector.set_algorithm_params(algo_name, algo_params)
# 训练模型
results = algorithm_selector.train_models(X_combined, y_combined)
```
### 前端调用示例 (Vue)
```vue
<template>
<div class="container">
<h1>气体传感器数据分析</h1>
<!-- 文件上传 -->
<div class="section">
<h2>上传数据文件</h2>
<input type="file" multiple @change="handleFileChange" ref="fileInput">
<button @click="uploadFiles">上传文件</button>
<p v-if="uploadStatus">{{ uploadStatus }}</p>
<div v-if="labelInfo.length > 0">
<h3>数据集标签信息</h3>
<ul>
<li v-for="label in labelInfo" :key="label.id">
{{ label.name_cn }} ({{ label.sensor }}, {{ label.gas }}, {{ label.concentration }}ppm)
</li>
</ul>
</div>
</div>
<!-- 数据分析 -->
<div class="section" v-if="labelInfo.length > 0">
<h2>数据分析</h2>
<div class="algorithm-params">
<div v-for="(algo, key) in algorithms" :key="key">
<h3>{{ algo.name.cn }}</h3>
<div v-for="param in algo.params" :key="param.name">
<label>
{{ param.label }}:
<input :type="param.type" v-model="params[key][param.name]">
</label>
</div>
</div>
</div>
<button @click="analyzeData">执行分析</button>
<!-- 显示结果 -->
<div v-if="results" class="results">
<div v-for="(result, algo) in results" :key="algo" class="result-card">
<h3>{{ result.name }}</h3>
<p>训练准确率: {{ result.train_accuracy.toFixed(4) }}</p>
<p>测试准确率: {{ result.test_accuracy.toFixed(4) }}</p>
<pre v-if="result.classification_report">{{ result.classification_report }}</pre>
<div v-if="result.feature_importances">
<h4>特征重要性:</h4>
<ul>
<li v-for="(importance, feature) in result.feature_importances" :key="feature">
{{ feature }}: {{ importance.toFixed(4) }}
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</template>
<script>
import axios from 'axios';
export default {
data() {
return {
files: [],
uploadStatus: '',
labelInfo: [],
results: null,
// 算法配置
algorithms: {
knn: {
name: { cn: 'K-近邻算法', en: 'K-Nearest Neighbors' },
params: [
{ name: 'n_neighbors', label: '邻居数量', type: 'number', default: 5 },
{ name: 'metric', label: '距离度量', type: 'text', default: 'euclidean' }
]
},
svm: {
name: { cn: '支持向量机', en: 'Support Vector Machine' },
params: [
{ name: 'C', label: '正则化参数', type: 'number', default: 1.0 },
{ name: 'kernel', label: '核函数', type: 'text', default: 'rbf' }
]
},
random_forest: {
name: { cn: '随机森林', en: 'Random Forest' },
params: [
{ name: 'n_estimators', label: '树的数量', type: 'number', default: 100 },
{ name: 'max_depth', label: '最大深度', type: 'number', default: null }
]
}
},
params: {
knn: { n_neighbors: 5, metric: 'euclidean' },
svm: { C: 1.0, kernel: 'rbf' },
random_forest: { n_estimators: 100, max_depth: null }
}
};
},
methods: {
handleFileChange(event) {
this.files = Array.from(event.target.files);
if (this.files.length < 2) {
this.uploadStatus = '需要至少两个文件进行分析';
} else {
this.uploadStatus = `已选择 ${this.files.length} 个文件`;
}
},
async uploadFiles() {
if (this.files.length < 2) {
this.uploadStatus = '需要至少两个文件进行分析';
return;
}
const formData = new FormData();
this.files.forEach(file => {
formData.append('files', file);
});
try {
const response = await axios.post('http://localhost:5000/upload', formData, {
headers: {
'Content-Type': 'multipart/form-data'
}
});
this.uploadStatus = `上传成功! 样本数: ${response.data.sample_count}`;
this.labelInfo = response.data.label_info;
} catch (error) {
console.error('上传失败:', error);
this.uploadStatus = `上传失败: ${error.response?.data?.error || error.message}`;
}
},
async analyzeData() {
try {
const response = await axios.post('http://localhost:5000/analyze', {
params: this.params
});
this.results = response.data.results;
console.log('分析结果:', this.results);
} catch (error) {
console.error('分析失败:', error);
alert(`分析失败: ${error.response?.data?.error || error.message}`);
}
}
}
};
</script>
<style scoped>
/* 样式保持不变 */
</style>
```
### 关键功能说明
1. **自动元数据提取**:
- 系统自动从文件名中提取传感器类型、气体类型和浓度
- 不需要用户手动输入这些信息
2. **多维度标签系统**:
- 使用 `GasSensorDataAnalyzer` 的多维度标签系统
- 返回详细的标签信息给前端,包括传感器、气体和浓度的组合
3. **灵活的算法配置**:
- `AlgorithmSelector` 类支持多种算法和参数配置
- 前端可以动态调整算法参数
4. **错误处理和日志**:
- 全面的错误处理和日志记录
- 详细的错误信息返回给前端
5. **数据集重置功能**:
- `/reset` 端点用于清除当前加载的数据集
### 使用流程
1. **上传文件**:
- 选择至少两个 Excel 文件(`.xlsx` 或 `.xls`)
- 文件名应包含传感器类型、气体类型和浓度信息(例如:`MP2_acetone_20ppm.xlsx`)
2. **数据加载**:
- 后端自动提取文件中的元数据
- 加载并合并数据集
- 返回标签信息给前端
3. **数据分析**:
- 调整算法参数(可选)
- 点击"执行分析"按钮
- 查看分析结果(准确率、分类报告、特征重要性)
### 文件命名规范
为了确保系统能正确提取元数据,文件名应遵循以下格式之一:
1. **基本格式**:
```
[传感器类型]_[气体类型]_[浓度]ppm.xlsx
```
示例:`MP2_acetone_20ppm.xlsx`
2. **混合气体**:
```
[传感器类型]_[气体1]+[气体2]_[浓度]ppm.xlsx
```
示例:`MP801_toluene+formaldehyde_50ppm.xlsx`
3. **中文支持**:
```
MP2_丙酮_20ppm.xlsx
MP801_甲苯+甲醛_50ppm.xlsx
```
这个修改后的 `app.py` 文件完全集成了 `new_algorithm.py` 的功能,并优化了文件处理流程,确保当选择两个文件时能够正确分析数据。