调用karateclub的BigClam遇到的一些问题_karateclub()报错缺少参数-优快云博客

本文链接：https://blog.youkuaiyun.com/Blues_86/article/details/127930882

本文介绍使用BigClam算法进行社区发现的过程。解决因节点编号不连续导致的错误，并成功应用于复杂网络分析。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

复杂网络的课程要求用BigClam做社区发现。

from karateclub import BigClam

老师给引好了包，但是一开始并不是太会使用这个。索性有同学找到类似的调用，拿来试了试。

# G为已经生成好的无向图
model = BigClam()
model.fit(G)
Big_ms = model.get_memberships()

发生报错：

然后去查了查源码：

karateclub.community_detection.overlapping.bigclam — karateclub documentation

def fit(self, graph: nx.classes.graph.Graph):
        """
        Fitting a BigClam clustering model.

        Arg types:
            * **graph** *(NetworkX graph)* - The graph to be clustered.
        """
        self._set_seed()
        graph = self._check_graph(graph)
        number_of_nodes = graph.number_of_nodes()
        self._initialize_features(number_of_nodes)
        nodes = [node for node in graph.nodes()]
        for i in range(self.iterations):
            random.shuffle(nodes)
            for node in nodes:
                nebs = [neb for neb in graph.neighbors(node)]
                neb_features = self._embedding[nebs, :]
                node_feature = self._embedding[node, :]
                gradient = self._calculate_gradient(node_feature, neb_features)
                self._do_updates(node, gradient, node_feature)

很明确，BigClam继承了Estimator，调用_check_graph()的时候发生了报错，所以接着看Estimator的源码：karateclub/estimator.py at master · benedekrozemberczki/karateclub · GitHub

    def _check_indexing(graph: nx.classes.graph.Graph):
        """Checking the consecutive numeric indexing."""
        numeric_indices = [index for index in range(graph.number_of_nodes())]
        node_indices = sorted([node for node in graph.nodes()])

        assert numeric_indices == node_indices, "The node indexing is wrong."

    def _check_graph(self, graph: nx.classes.graph.Graph) -> nx.classes.graph.Graph:
        """Check the Karate Club assumptions about the graph."""
        self._check_indexing(graph)
        graph = self._ensure_integrity(graph)

        return graph

也很明确了，我的图G的numeric_indices和node_indices对不上。

原因是numeric_indices是从0开始遍历到节点数，但是node_indices是直接输出节点名，我在生成G的时候用的是某个数据处理出来的List，每个节点自带一个节点名，所以当然对不上。

numeric_indices = [index for index in range(G.number_of_nodes())]
node_indices = sorted([node for node in G.nodes()])

print(numeric_indices)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,...，2707]
print(node_indices)
# [35, 40, 114, 117, 128, 130, 164, 288, 424,434,...,1155073]

然后大致翻了翻源码，大概意思是为了做一些准备工作，需要给图G的每个节点增加一条连向自己的边。这个自连边的操作直接是按照节点数（而不是节点名）的遍历加的边，所以要先确认图G中每个节点的节点名要和其排序后的索引下标一致（也不知道为啥这样写....感觉好蠢...）。

没办法，只能对原来数据操作一下生成一张新图：

# c1为被指向者，c2为指向者，所以是c2——>c1
df1=pd.read_csv("cora.cites", header=None, names=["c1", "c2"], sep='\t')

arr = np.array(df1)[:,[1,0]] #交换列，方便构建有向边，不过暂时还只需要无向图
edgeList = arr.tolist()
noteSet = set(df1['c1']) | set(df1['c2']) # 取两列并集

# 由于需要调用BigClam，需要修改节点名
noteSet_bc = [index for index in range(n_num)]

numeric_indices = np.array(noteSet_bc)
node_indices = np.array(sorted(noteSet))

edgeList_bc = np.copy(edgeList)
for key,value in zip(node_indices, numeric_indices):
    edgeList_bc[edgeList == key] = value

# 构建用于BigClam的新图
G_bc = nx.Graph();
G_bc.add_nodes_from(noteSet_bc)
G_bc.add_edges_from(edgeList_bc)

再次运行代码，跑通。

model = BigClam()
model.fit(G_bc)
Big_ms = model.get_memberships()