贝叶斯网络K2算法及其增量算法的实现（基于matlab FullBNT -1.0.7 ）

最新推荐文章于 2025-05-26 14:28:01 发布

Jamie_Wu

最新推荐文章于 2025-05-26 14:28:01 发布

阅读量7.4k

点赞数 8

CC 4.0 BY-SA版权

文章标签：贝叶斯

本文链接：https://blog.youkuaiyun.com/ibelieve8013/article/details/80203746

众所周知，K2算法是贝叶斯网络结构学习的经典算法，其本质是一种结合了爬山算法和贝叶斯评分算法的综合算法。本文就将基于贝叶斯工具箱，详细阐述其算法的原理，以及结合了论文Yasin A, Leray P. iMMPC: a local search approach for incremental Bayesian network structure learning[C]// International Symposium on Intelligent Data Analysis. Springer Berlin Heidelberg, 2011:401-412.

中的增量的思想，对K2算法的一种改进。实现在大量数据下显著提高算法的效率。

其实该思想是很简单的:我们可以先利用K2算法学习出一个基本的结构，在学习的过程中，可以保存下来我学习的路径，即算法每一次的决策，那么我改进的地方在哪里呢，就是我不仅保存了最优的路径，而且我保存住几条次优的路径（算法中加上最优一共是4个路径），我将次优的路径作为我下一次搜索的空间，注意：这里有一个假设，同时也是这个算法的缺陷，假定此次决策不是最优的，那么也会是在评分较高的几个选择里面，所以算法剔除掉了低分的模型，缩小了搜索空间，提升了算法的效率。

如下图，左边是算法第一次的执行过程，此时每一步保存了4个候选步骤，在新的数据到来之后，将采用增量算法，即右边的算法，每一次搜索的空间大大减小（只有4个选择，你说快不快）。

废话不多说，贴代码：

function dag = learn_struct_K2(data, ns, order, varargin)
% LEARN_STRUCT_K2 Greedily learn the best structure compatible with a fixed node ordering
% best_dag = learn_struct_K2(data, node_sizes, order, ...)
%
% data(i,m) = value of node i in case m (can be a cell array).
% node_sizes(i) is the size of node i.
% order(i) is the i'th node in the topological ordering.
%
% The following optional arguments can be specified in the form of name/value pairs:
% [default value in brackets]
%
% max_fan_in - this the largest number of parents we allow per node [N]
% scoring_fn - 'bayesian' or 'bic' [ 'bayesian' ]
%              Currently, only networks with all tabular nodes support Bayesian scoring.
% type       - type{i} is the type of CPD to use for node i, where the type is a string
%              of the form 'tabular', 'noisy_or', 'gaussian', etc. [ all cells contain 'tabular' ]
% params     - params{i} contains optional arguments passed to the CPD constructor for node i,
%              or [] if none.  [ all cells contain {'prior', 1}, meaning use uniform Dirichlet priors ]
% discrete   - the list of discrete nodes [ 1:N ]
% clamped    - clamped(i,m) = 1 if node i is clamped in case m [ zeros(N, ncases) ]
% verbose    - 'yes' means display output while running [ 'no' ]
%
% e.g., dag = learn_struct_K2(data, ns, order, 'scoring_fn', 'bic', 'params', [])
%
% To be backwards compatible with BNT2, you can also specify arguments as follows
%   dag = learn_struct_K2(data, node_sizes, order, max_fan_in)    
%
% This algorithm is described in
% - Cooper and Herskovits,  "A Bayesian method for the induction of probabilistic
%      networks from data", Machine Learning Journal 9:308--347, 1992

[n ncases] = size(data);

% set default params
type = cell