Why Link Prediction?

link prediction aims at estimating the likelihood of the existence of a link between two nodes, based on observed links and the attributes of nodes. Link prediction helps in analyzing networks with missing data , for example, many biological networks, such as food webs, protein–protein interaction networks and metabolic networks, whether a link between two nodes exists must be demonstrated by field and/or laboratory experiments, which are usually very costly.Our knowledge of these networks is very limited, for example, 80% of the molecular interactions in cells of Yeast [1] and 99.7% of human [2] are still unknown. Instead of blindly checking all possible interactions, to predict based on known interactions and focus on those links most likely to exist can sharply reduce the experimental costs if the predictions are accurate enough.In addition, link prediction algorithms can be used to predict the links that may appear in the future of evolving networks, such as friendship recommendation in online social networks and product recommendation in e-commerce web sites [3]. Further applications of link prediction include the identification of spurious links, estimation of the competing mechanisms in evolving networks, classification of nodes, and so on.
 
Thanks to the fundamental theoretical importance in network sciences and the wide applications, the study of link prediction has attracted much attention recently. As a supportive evidence, a recent survey has accumulated 515 citations for only about 4 years till Aug. 2015, and a number of papers on this topic got published in prestigious journals [5-9].
 
Link prediction will be still a very hot topic in the near future.The major directions in the next five years include (but not being limited) in the four following directions. (i) Some fundamental theoretical issues in link prediction, such as the link predictability of networks. (ii) Link prediction for different kinds of networks, such as directed networks and weighted networks, as well as the prediction of direction and weights, in addition to the existence of a link. (iii) Link prediction algorithms with some novel kind sof data, such as link prediction in social networks with human mobility data.(iv) The connection of link prediction algorithms to some other related topic,such as the understanding of networks evolution and the detection of community structure.
 
[1] H. Yu,  et al. ,Science 322 (2008) 104.
[2] L. A. N. Amaral, PNAS 105 (2008) 6795.
[3] L. Lü,  et al. ,Physics Reports 519 (2012) 1.
[4] L. Lü,  et al. ,Physica A 390 (2011) 1150.
[5] A. Clauset,  et al. ,Nature 453 (2008) 98.
[6] R. Guimera,  et al. ,PNAS 106 (2009) 22073.
[7] B. Barzel,  et al. ,Nature Biotechnology 31 (2013) 720.
[8] P. Bastiaens,  et al. ,Nature Biotechnology 33 (2015) 336.

[9] L. Lü, et al., PNAS 112 (2015) 2325.


## Assignment ### **Question 1: Tuning Logistic Regression Regularization (C)** The notebook trains a logistic regression model with a default regularization strength. The parameter `C` in logistic regression controls the **trade-off between bias and variance**. - **Task**: Change the `C` value in `LogisticRegression(C=...)` to **0.01, 1 (default), and 100**. - **Hint**: Higher `C` means less regularization, and lower `C` means more regularization. - **Deliverable**: A short explanation of how the model’s **precision, recall, and F1-score** change with different values of `C`. --- ### **Question 2: Changing Graph Train-Test Edge Split Strategy** The notebook splits **edges** into train and test sets for link prediction, but the way edges are split may impact model performance. - **Task**: Modify the train-test split strategy by testing **random edge splits of 60%-40%, 70%-30%, and 80%-20%**. - **Hint**: Look for `train_test_split()` applied to the **edges** and change the train-test proportion. - **Deliverable**: A **comparison table** showing the AUC scores for each split setting. --- ### **Question 3: LightGBM Model** Modify the existing model to answer the following questions: #### 1. **Feature Utilization** - Increase the `feature_fraction` from `0.5` to `0.8`. - Does this improve or degrade the model's AUC score? #### 2. **Bagging Strategy** - Adjust the `bagging_fraction` from `0.5` to `0.7` while keeping `bagging_freq` at `20`. - Observe how this affects the final performance. #### **Deliverables** - **Record and compare** the AUC score, total boosting rounds, and any performance changes after each modification. - **Briefly explain** whether the modifications had a positive or negative effect and why. --- ### **Question 4: Network Measurement Task** #### **Dataset: Les Misérables Character Network** For this exercise, we will use the **Les Misérables Character Network**, which represents the co-occurrences of characters in Victor Hugo's novel. Each node corresponds to a character, and an edge between two nodes indicates that the characters appear together in the same chapter. The dataset comprises 77 nodes and 254 edges. #### **Your Tasks** Using the **Les Misérables Character Network**, perform the following analyses: 1. **Degree Centrality**: - Calculate the **degree** of each node to determine the number of connections each character has. - Identify the top 5 characters with the highest degree centrality. 2. **Clustering Coefficient**: - Compute the **clustering coefficient** for each node to assess the tendency of characters to cluster together. - Determine the average clustering coefficient of the network. 3. **Betweenness Centrality**: - Calculate the **betweenness centrality** for each node to find characters that serve as bridges between different parts of the network. - Identify the top 5 characters with the highest betweenness centrality. 4. **PageRank**: - Compute the **PageRank** of each node to evaluate the influence of each character within the network. - List the top 5 characters according to their PageRank scores. 5. **Network Density**: - Calculate the **density** of the network to understand how interconnected the characters are. - Interpret what this density value implies about the network's structure. 6. **Shortest Path Analysis**: - Determine the **average shortest path length** in the network. - Find the shortest path between two specific characters, such as *Jean Valjean* and *Cosette*. 7. **Connected Components**: - Identify all **connected components** within the network. - Ascertain whether the network is fully connected or if there are isolated sub-networks. #### **Deliverables** - Performs all the analyses mentioned above. - Outputs the results in a clear and concise manner. 怎么做
最新发布
03-08
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值