Python实现贝叶斯优化器(Bayes_opt)优化LightGBM回归模型(LGBMRegressor算法)项目实战

说明:这是一个机器学习实战项目(附带数据+代码+文档+视频讲解),如需数据+代码+文档+视频讲解可以直接到文章最后获取。




1.项目背景

贝叶斯优化器 (BayesianOptimization) 是一种黑盒子优化器,用来寻找最优参数。

贝叶斯优化器是基于高斯过程的贝叶斯优化,算法的参数空间中有大量连续型参数,运行时间相对较短。

贝叶斯优化器目标函数的输入必须是具体的超参数,而不能是整个超参数空间,更不能是数据、算法等超参数以外的元素。

本项目使用基于贝叶斯优化器(Bayes_opt)优化LightGBM回归算法来解决回归问题。

2.数据获取

本次建模数据来源于网络(本项目撰写人整理而成),数据项统计如下:

数据详情如下(部分展示):

3.数据预处理

3.1用Pandas工具查看数据

使用Pandas工具的head()方法查看前五行数据:

关键代码:

3.2数据缺失查看

使用Pandas工具的info()方法查看数据信息:

从上图可以看到,总共有9个变量,数据中无缺失值,共1000条数据。

关键代码:

3.3数据描述性统计

通过Pandas工具的describe()方法来查看数据的平均值、标准差、最小值、分位数、最大值。

关键代码如下:

4.探索性数据分析

4.1 y变量直方图

用Matplotlib工具的hist()方法绘制直方图:

从上图可以看到,y变量主要集中在-200~200之间。

4.2相关性分析

从上图中可以看到,数值越大相关性越强,正值是正相关、负值是负相关。

5.特征工程

5.1建立特征数据和标签数据

关键代码如下:

5.2数据集拆分

通过train_test_split()方法按照80%训练集、20%测试集进行划分,关键代码如下:

6.构建贝叶斯优化器优化LightGBM回归模型

主要使用基于贝叶斯优化器优化LightGBM回归算法,用于目标回归。

6.1构建调优模型

6.2最优参数展示

寻优的过程信息:

最优参数结果展示:

最优参数组合:

num_leaves的参数值为: 22

n_estimators的参数值为: 489

learning_rate的参数值为: 0.07

最优分数: 0.9320547016980949

验证集分数: 0.8380583978630435

6.3最优参数构建模型

7.模型评估

7.1评估指标及结果

评估指标主要包括可解释方差值、平均绝对误差、均方误差、R方值等等。

从上表可以看出,R方0.9411,为模型效果较好。

关键代码如下:

7.2真实值与预测值对比图

从上图可以看出真实值和预测值波动基本一致,模型拟合效果良好。

8.结论与展望

综上所述,本文采用了贝叶斯优化器优化LightGBM回归模型算法寻找最优参数值来构建回归模型,最终证明了我们提出的模型效果良好。此模型可用于日常产品的预测。


# 本次机器学习项目实战所需的资料,项目资源如下:
 
# 项目说明:

# 链接:https://pan.baidu.com/s/11ONOpsSVV9S8dW7e_9h2yA 
# 提取码:et9f

更多项目实战,详见机器学习项目实战合集列表

机器学习项目实战合集列表-优快云博客


python写的一段贝叶斯网络的程序 This file describes a Bayes Net Toolkit that we will refer to now as BNT. This version is 0.1. Let's consider this code an "alpha" version that contains some useful functionality, but is not complete, and is not a ready-to-use "application". The purpose of the toolkit is to facilitate creating experimental Bayes nets that analyze sequences of events. The toolkit provides code to help with the following: (a) creating Bayes nets. There are three classes of nodes defined, and to construct a Bayes net, you can write code that calls the constructors of these classes, and then you can create links among them. (b) displaying Bayes nets. There is code to create new windows and to draw Bayes nets in them. This includes drawing the nodes, the arcs, the labels, and various properties of nodes. (c) propagating a-posteriori probabilities. When one node's probability changes, the posterior probabilities of nodes downstream from it may need to change, too, depending on firing thresholds, etc. There is code in the toolkit to support that. (d) simulating events ("playing" event sequences) and having the Bayes net respond to them. This functionality is split over several files. Here are the files and the functionality that they represent. BayesNetNode.py: class definition for the basic node in a Bayes net. BayesUpdating.py: computing the a-posteriori probability of a node given the probabilities of its parents. InputNode.py: class definition for "input nodes". InputNode is a subclass of BayesNetNode. Input nodes have special features that allow them to recognize evidence items (using regular-expression pattern matching of the string descriptions of events). OutputNode.py: class definition for "output nodes". OutputBode is a subclass of BayesNetNode. An output node can have a list of actions to be performed when the node's posterior probability exceeds a threshold ReadWriteSigmaFiles.py: Functionality for loading and saving Bayes nets in an XML format. SampleNets.py: Some code that constructs a sample Bayes net. This is called when SIGMAEditor.py is started up. SIGMAEditor.py: A main program that can be turned into an experimental application by adding menus, more code, etc. It has some facilities already for loading event sequence files and playing them. sample-event-file.txt: A sequence of events that exemplifies the format for these events. gma-mona.igm: A sample Bayes net in the form of an XML file. The SIGMAEditor program can read this type of file. Here are some limitations of the toolkit as of 23 February 2009: 1. Users cannot yet edit Bayes nets directly in the SIGMAEditor. Code has to be written to create new Bayes nets, at this time. 2. If you select the File menu's option to load a new Bayes net file, you get a fixed example: gma-mona.igm. This should be changed in the future to bring up a file dialog box so that the user can select the file. 3. When you "run" an event sequence in the SIGMAEditor, the program will present each event to each input node and find out if the input node's filter matches the evidence. If it does match, that fact is printed to standard output, but nothing else is done. What should then happen is that the node's probability is updated according to its response method, and if the new probability exceeds the node's threshold, then its successor ("children") get their probabilities updated, too. 4. No animation of the Bayes net is performed when an event sequence is run. Ideally, the diagram would be updated dynamically to show the activity, especially when posterior probabilities of nodes change and thresholds are exceeded. To use the BNT, do three kinds of development: A. create your own Bayes net whose input nodes correspond to pieces of evidence that might be presented and that might be relevant to drawing inferences about what's going on in the situation or process that you are analyzing. You do this by writing Python code that calls constructors etc. See the example in SampleNets.py. B. create a sample event stream that represents a plausible sequence of events that your system should be able to analyze. Put this in a file in the same format as used in sample-event-sequence.txt. C. modify the code of BNT or add new modules as necessary to obtain the functionality you want in your system. This could include code to perform actions whenever an output node's threshold is exceeded. It could include code to generate events (rather than read them from a file). And it could include code to describe more clearly what is going on whenever a node's probability is updated (e.g., what the significance of the update is -- more certainty about something, an indication that the weight of evidence is becoming strong, etc.)
贝叶斯优化是一种优化算法,用于寻找一个黑箱函数的最大值或最小值。在机器学习领域,贝叶斯优化可以用于对模型的超参数进行优化。 在lightgbm模型中,特征提取是一个重要的步骤。贝叶斯优化可以用来优化特征提取的参数,例如特征数、特征采样率等。 下面是一个使用贝叶斯优化lightgbm特征进行提取的例子: ```python import lightgbm as lgb from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from bayes_opt import BayesianOptimization # 加载数据集 data = load_breast_cancer() X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42) # 构造lightgbm模型,用于特征提取 def lgb_cv(num_leaves, feature_fraction, bagging_fraction, max_depth, min_split_gain, min_child_weight): params = {'objective': 'binary', 'metric': 'auc', 'num_leaves': int(num_leaves), 'feature_fraction': max(min(feature_fraction, 1), 0), 'bagging_fraction': max(min(bagging_fraction, 1), 0), 'max_depth': int(max_depth), 'min_split_gain': min_split_gain, 'min_child_weight': min_child_weight, 'verbose': -1, 'seed': 42} cv_result = lgb.cv(params, lgb.Dataset(X_train, y_train), num_boost_round=1000, nfold=5, stratified=False, shuffle=True, metrics=['auc'], early_stopping_rounds=50) return cv_result['auc-mean'][-1] # 定义贝叶斯优化的参数空间 lgbBO = BayesianOptimization(lgb_cv, {'num_leaves': (24, 45), 'feature_fraction': (0.1, 0.9), 'bagging_fraction': (0.8, 1), 'max_depth': (5, 15), 'min_split_gain': (0.001, 0.1), 'min_child_weight': (5, 50)}) # 进行贝叶斯优化 lgbBO.maximize(init_points=5, n_iter=25, acq='ei') # 根据优化的结果提取特征 params = lgbBO.max['params'] params['num_leaves'] = int(params['num_leaves']) params['max_depth'] = int(params['max_depth']) params['verbose'] = -1 params['objective'] = 'binary' params['metric'] = 'auc' params['boosting_type'] = 'gbdt' params['seed'] = 42 gbm = lgb.train(params, lgb.Dataset(X_train, y_train), num_boost_round=1000, verbose_eval=False) # 提取特征的重要性 feature_importance = gbm.feature_importance() feature_names = data.feature_names # 打印特征的重要性 for feature_name, importance in zip(feature_names, feature_importance): print(feature_name, ':', importance) ``` 上面的代码中,我们使用了BayesianOptimization库实现贝叶斯优化。定义了一个lgb_cv函数用于训练lightgbm模型,并返回最终的AUC值。然后定义了一个参数空间,包括num_leaves、feature_fraction、bagging_fraction、max_depth、min_split_gain和min_child_weight等参数。接着,我们使用maximize函数进行贝叶斯优化,初始化5个点,迭代25次,使用ei作为acq函数。 最后,我们根据优化的结果提取特征,并打印出每个特征的重要性。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

张陈亚

您的鼓励,将是我最大的坚持!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值