Modularity maximization
Our goal is to find a measure that quantifies how many edges lie within groups in our network relative to the number of such edges expected on the basis of chance. A good division of nodes into communities is one that maximizes such a measure. Equivalently, we want a measure that quantifies how many edges lie between groups in our network relative to the expected number of such links. A good division of nodes into communities is one that minimizes such a measure. We will concentrate on the former measure of modularity of a network.
Let us focus on undirected multi-graphs, that is, graphs that allow self-edges (edges involving the same node) and multi-edges (more than one simple edge between two vertices). A measure of modularity of a network is the number of edges that run between vertices of the same community minus the number of such edges we would expect to find if the configuration model is assumed, that is if edges were positioned at random while preserving the vertex degrees. Let us denote cic_ici the community of vertex iii and δ(ci,cj)=1\delta(c_i,c_j) = 1δ(ci,cj)=1 if ci=cjc_i = c_jci=cj and δ(ci,cj)=0\delta(c_i,c_j) = 0δ(ci,cj)=0 otherwise. Hence, the number of edges that run between vertices of the same group is:
∑(i,j)∈Eδ(ci,cj)=12∑i,jAi,jδ(ci,cj)\displaystyle{\sum_{(i,j) \in E} \delta(c_i, c_j) = \frac{1}{2} \sum_{i,j} A_{i,j} \delta(c_i, c_j) }(i,j)∈E∑δ(ci,cj)=21i,j∑Ai,jδ(ci,cj)
where EEE is the set of edges of the graph and Ai,jA_{i,j}Ai,j is the actual number of edges between iii and jjj, which is zero or more (notice that each undirected edge is represented by two pairs in the second sum, hence the factor one-half).
The expected number of edges that run between vertices of the same group is:
12∑i,jkikj2mδ(ci,cj)\displaystyle{\frac{1}{2} \sum_{i,j} \frac{k_i k_j}{2m} \delta(c_i, c_j) }21i,j∑2mkikjδ(ci,cj)
where kik_iki and kjk_jkj are the degrees of iii and jjj, while mmm is the number of edges of the graph. Notice that kikj/2mk_i k_j / 2mkikj/2m is the expected number of edges between vertices iii and jjj in the configuration model assumption. Indeed, consider a particular edge attached to vertex iii. The probability that this edge goes to node jjj is kj/2mk_j / 2mkj/2m, since the number of edges attached to jjj is kjk_jkj and the total number of edge ends in the network is 2m2m2m (the sum of all node degrees). Since node iii has kik_iki edges attached to it, the expected number of edges between iii and jjj is kikj/2mk_i k_j / 2mkikj/2m.
Hence the difference between the actual and expected number of edges connecting nodes of the same group, expressed as a fraction with respect to the total number of edges mmm, is called modularity, and given by:
Q=12m∑i,j(Ai,j−kikj2m)δ(ci,cj)=12m∑i,jBi,jδ(ci,cj)\displaystyle{Q = \frac{1}{2m} \sum_{i,j} \left(A_{i,j} - \frac{k_i k_j}{2m}\right) \delta(c_i, c_j) = \frac{1}{2m} \sum_{i,j} B_{i,j} \delta(c_i, c_j) }Q=2m1i,j∑(Ai,j−2mkikj)δ(ci,cj)=2m1i,j∑Bi,jδ(ci,cj)
where: Bi,j=Ai,j−kikj2m B_{i,j} = A_{i,j} - \frac{k_i k_j}{2m} Bi,j=Ai,j−2mkikj and BBB is called the modularity matrix.
The modularity QQQ takes positive values if there are more edges between same-group vertices than expected, and negative values if there are less. Our goal is to find the partition of network nodes into communities such that the modularity of the division is maximum. Unfortunately, this is a computationally hard problem. It is believed that the only algorithms capable of always finding the division with maximum modularity take exponentially long to run and hence are useless for all but the smallest of networks. Instead, therefore, we turn to heuristic algorithms, algorithms that attempt to maximize the modularity in an intelligent way that gives reasonably good results in a quick time.