活动地址:优快云21天学习挑战赛
The main Question
How to find important nodes in a network?

Node Importance
-
Degree
-
Average proximity to other nodes
与其它结点的平均接近度
-
Fraction of shortest paths that pass through node
通过特定节点的最短路径的比例
at this point, these three definitions of importance are very informal and the goal of this video and the following videos is going to be to get a more precise definitions of how to measure importance in a network.
Network Centrality
more generally these measure **allows us to find nodes that prevent the network from breaking up. **
Centrality Measures:
- Degree centrality
- Closeness centrality
- Betweenness centrality
- Load centrality
- Page Rank
- Katz centrality
- Percolation centrality
Degree centrality
Assumption:important nodes have many connections.
The most basic measure of centrality: number of neighbors
- Undirected networks: use degree
C d e g ( v ) = d v ∣ N ∣ − 1 N : 总节点数 d v : 节点 v 的度数 C_{deg}(v)=\frac{d_v}{|N|-1}\\ N:总节点数\\ d_v:节点v的度数 Cdeg(v)=∣N∣−1dvN:总节点数dv:节点v的度数
取值范围为 [ 0 , 1 ] [0,1] [0,1]
当节点V是孤立节点时,该值为0
当节点v与其他所有节点都有连接时,该值为1
import networkx as nx
G=nx.karate_club_graph()
G=nx.convert_node_labels_to_integers(G,first_label=1)
degCent=nx.degree_centrality(G)
print(type(degCent))
#>> <class 'dict'>返回值是个字典
print(degCent[34])
#>>0.5151515151515151 17/33
print(degCent[33])
#>> 0.36363636363636365 16/33
- Directed networks: use in-degree or out-degree

C i n d e g ( v ) = d v i n ∣ N ∣ − 1 N : 总节点数 d v i n : 节点 v 的入度 C_{indeg}(v)=\frac{d_v^{in}}{|N|-1}\\ N:总节点数\\ d_v^{in}:节点v的入度 Cindeg(v)=∣N∣−1dvinN:总节点数dvin:节点v的入度
indegCent=nx.in_degree_centrality(G)
indegCent['A']
indegCent['C']
C o u t d e g ( v ) = d v o u t ∣ N ∣ − 1 N : 总节点数 d v o u t : 节点 v 的入度 C_{outdeg}(v)=\frac{d_v^{out}}{|N|-1}\\ N:总节点数\\ d_v^{out}:节点v的入度 Coutdeg(v)=∣N∣−1dvoutN:总节点数dvout:节点v的入度
indegCent=nx.out_degree_centrality(G)
indegCent['A']
indegCent['C']
Closeness Centrality
Assumption: important nodes are close to other nodes
C
c
l
o
s
e
(
v
)
=
∣
N
∣
−
1
∑
u
∈
N
d
(
v
,
u
)
N
:
图的总结点数
d
(
v
,
u
)
:
v
到
u
的最短距离
C_{close}(v)=\frac{|N|-1}{\sum_{u\in N}d(v,u)}\\ N:图的总结点数\\ d(v,u):v到u的最短距离
Cclose(v)=∑u∈Nd(v,u)∣N∣−1N:图的总结点数d(v,u):v到u的最短距离
closeCent=nx.closeness_centrality(G)
print(type(closeCent))
#<class 'dict'>
print(closeCent[32])
#>>0.5409836065573771
print(sum(nx.shortest_path_length(G,32).values()))
#>>61
print((len(G.nodes())-1)/61)
#>>0.5409836065573771
Disconnected Nodes
How to measure the closeness centrality of a node when it cannot reach all other nodes?
Option 1
Only consider nodes that L can reach:
C
c
l
o
s
e
(
L
)
=
∣
R
(
L
)
∣
∑
u
∈
R
(
L
)
d
(
L
,
u
)
R
(
L
)
:
L
可达的节点集合
C_{close}(L)=\frac{|R(L)|}{\sum_{u\in R(L)d(L,u)}}\\ R(L):L可达的节点集合
Cclose(L)=∑u∈R(L)d(L,u)∣R(L)∣R(L):L可达的节点集合
看回有向图,L只能到达M点
C
c
l
o
s
e
(
L
)
=
1
1
=
1
C_{close}(L)=\frac{1}{1}=1\\
Cclose(L)=11=1
Problem:centrality of 1 is too high for a node than can only reach other node!
Option 2
Consider only nodes that L can reach and normalize by the fraction of nodes L can reach:
C
c
l
o
s
e
(
L
)
=
[
∣
R
(
L
)
∣
∣
N
−
1
∣
]
∣
R
(
L
)
∣
∑
u
∈
R
(
L
)
d
(
L
,
u
)
C_{close}(L)=[\frac{|R(L)|}{|N-1|}]\frac{|R(L)|}{\sum_{u\in R(L)d(L,u)}}\\
Cclose(L)=[∣N−1∣∣R(L)∣]∑u∈R(L)d(L,u)∣R(L)∣
C c l o s e ( L ) = [ 1 14 ] 1 1 = 0.071 C_{close}(L)=[\frac{1}{14}]\frac{1}{1}=0.071 Cclose(L)=[141]11=0.071
One thing to note here is that in this new definition when we’re normalizing
如果图本身时完全强连通的,我们不需要对原本的定义进行规范化
但如果图中,存在多个连通分量,或者有向图不是强连通图,就需要规范化
closeCent=nx.closeness_centrality(G,wf_improved=True)
closeCent=nx.closeness_centrality(G,wf_improved=False)