infomap的核心问题

最新推荐文章于 2024-11-13 13:58:16 发布

原创

最新推荐文章于 2024-11-13 13:58:16 发布 · 2.9k 阅读

17 ·

CC 4.0 BY-SA版权

本文深入解析Louvain、LPA及Infomap算法，探讨社区检测的理论基础与实践技巧，对比不同算法优劣，助您掌握社区结构分析的核心。

How does the algorithm basically work? infomap工作原理是啥
What has random walks to do with it? 随机游走在里面具体起到了什么作用
What is the map equation and what is (clearly) the difference to modularity optimization? (There was an example given in the paper in Fig. 3 , but i didn't get that) map方程到底是啥？它和模块度优化有啥区别？
On their homepage, there are 2 improvements given. The first one is Submodule movements and the second one is Single-node movements. Why are they used and why are merged modules not seperateable? 主页上给出了2个改进方案。一个是子模块的移动，另一个是单节点的移动。为啥使用它们俩进行改进呢？为啥合并的模块不可拆分呢？

主页: http://www.mapequation.org/code.html

论文: http://www.mapequation.org/assets/publications/EurPhysJ2010Rosvall.pdf

Here is where the optimization algorithm comes onto the scene: when you use too few modules, you are effectively still back at the level of using an individual codeword for every node, but use too many modules, and the number of prefix codes becomes too large. So we need to find an optimal partition that assigns nodes to modules such that the information needed to compress the movement of our random walkers is minimized (equation 1 from their paper).

如果你使用过少的模块，那么也就回到了为每一个节点都单独编制一个码字的情形（回到了全局统一编码）；如果你使用了过多的模块，则前缀编码的规模就会变得很大。所以，需要在这两者之间，寻找一种最优结果：找到一个最有划分，将所有节点划分到不同的模块中，使得描述随机游走的的信息长度被压缩到最短（论文中的equ.1）。

对于一个网络，其可能存在的社区数量区间为[1,n]，而对于n个节点的网络，社区划分（有多少个社区，每个社区由哪些节点构成；在相同社区数量的情况下，每一个社区内的节点可能不一样。）的总数为Bell数。（在组合数学里，贝尔数给出了集合划分的数目，以数学家埃里克·坦普尔·贝尔（Eric Temple Bell）命名，是组合数学中的一组整数数列。以B0= B1=1为始，首几项的贝尔数为：1, 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975, …（OEIS的A000110